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Abstract 

A  detailed  examination  was  performed  on  several 
commonly  applied  atmospheric  stability  indices  and  lightning 
activity  from  1993  to  2000  to  determine  the  indices 
usefulness  as  predictive  tools  for  determining  cloud-to- 
ground  lightning  activity.  Predetermined  radii  of  50 
nautical  miles  around  upper-air  stations  in  the  Midwest  U.S. 
were  used  for  the  lightning  summaries. 

Also  explored  is  an  improvement  upon  the  commonly 
accepted  thresholds  of  the  stability  indices  as  general 
thunderstorm  indicators.  An  improvement  was  found  and  new 
threshold  ranges  were  developed  for  relating  stability  index 
values  to  lightning  occurrence. 

Traditional  statistical  regression  methods  failed  to 
find  a  significant  predictive  relationship.  By  examining 
new  techniques  of  data  analysis,  it  was  found  that  the 
detection  and  classification  abilities  of  decision  trees 
derived  from  the  data-mining  field  best  served  the  purposes 
of  this  study.  Decision  trees  were  examined  on  the  large 
available  database  and  significant  results  were  found, 
resulting  in  the  development  of  a  lightning  forecast  tool 
for  both  the  probability  of  lightning  occurrence  and  its 


intensity.  The  predictive  ability  of  the  decision  trees 
used  in  this  study  for  lightning  detection  often  exceeded 
80-90%  for  most  locations  with  a  high  degree  of  confidence. 

The  most  significant  features  of  the  decision  tree 
results  were  formulated  into  a  forecast  prediction  tool  with 
summary  results  for  each  location  analyzed.  These  are 
specified  both  graphically  and  textually  in  a  user-friendly 
format  for  forecasters  to  use  as  a  "ready  to  use"  predictive 
tool  for  forecasting  lightning  activity. 

The  results  of  this  study  using  classification  and 
regression  trees  were  significant  enough  to  implement 
immediately  as  a  forecast  tool  for  the  operational  weather 
forecast  environment.  Appendix  A  of  this  study  is  written 
as  a  "ready-to-use"  forecast  tool  for  weather  forecasters. 

It  is  suggested  that  Air  Force  Weather  units  in  the  Midwest 
U.S.  use  this  "innovative"  forecast  tool  immediately  for 
forecasting  lightning  activity. 


DEVELOPMENT  OF  PREDICTORS  FOR  CLOUD -TO -GROUND  LIGHTNING 


ACTIVITY  USING  ATMOSPHERIC  STABILITY  INDICES 

I .  Introduction 

Thunderstorms  with  their  associated  lightning  impact 
all  aspects  of  military  operations.  For  United  States  Air 
Force  (USAF)  weather  forecasters,  flight  operations  are  most 
affected  by  lightning.  For  safety  reasons,  lightning 
activity  in  the  area  will  halt  most  operations  involving 
aircraft.  Problems  associated  with  lightning  are  not 
limited  to  Department  of  Defense  (DoD)  operations  since  many 
civil  functions  are  significantly  affected  as  well,  such  as 
agriculture,  transportation,  and  especially  the  power/energy 
industry.  The  power  industry  relies  heavily  on  thunderstorm 
forecasts,  especially  if  significant  lightning  is 
anticipated.  For  example,  inclement  weather  is  the  single 
largest  cause  of  power  outages,  equating  to  as  many  as  40% 
of  all  interruptions.  If  thunderstorms  are  possible,  great 
expenditure  is  made  by  this  industry  to  put  stand-by  workers 
on  call  and  get  back-up  generators  started  to  minimize  power 
interruptions.  In  addition,  widespread  cooling  caused  by 
evaporation  during  thunderstorm/rain  events  drastically 
reduces  customer  demand  for  air  conditioning  requirements 


1 


during  the  summer  season  in  the  U.S.  These  effects  are  most 
significant  in  highly  populated  regions.  Mismatches  between 
generation  capacity  and  customer  demand  either  waste 
valuable  resources  or  require  expensive  increases  in  supply 
for  purchases  of  additional  power  at  inflated  wholesale 
prices  (Dempsey  et  al . ,  1998) . 

Understanding  and  predicting  thunderstorms  and 
associated  cloud-to-ground  (CG)  lightning  activity  in  an 
operational  environment  proves  both  difficult  and  tasking, 
especially  when  considering  the  time  constraints  most 
operational  forecasters  operate  under.  This  research 
examines  atmospheric  stability  indices  as  possible 
predictive  tools  for  CG  lightning  activity  surrounding 
individual  upper-air  stations  in  the  Midwest  region  of  the 
United  States. 

1.1  Statement  of  the  Problem 

An  upper-air  station  is  a  weather  station  that  observes 
and  disseminates  weather  balloon  soundings  from  which  the 
parameters  for  atmospheric  stability  indices  are  derived. 
Balloon  soundings  indicate  the  state  of  the  atmosphere  by 
measuring  the  temperature,  humidity,  and  winds  as  functions 
of  pressure  (or  height)  as  a  balloon  ascends  up  through  the 
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atmosphere.  They  are  usually  plotted  manually  or  automated 
on  a  SKEW-T  log-p  diagram  or  in  raw  data  format  (AWS/TR-79, 
1990)  . 

Some  of  the  most  commonly  calculated  indices  are  the 
Lifted  Index  (LI) ,  the  K-Index  (KI) ,  and  the  convective 
available  potential  energy  (CAPE) .  Seven  indices  were 
chosen  and  calculated  for  the  locations  used  in  this  study. 
Operationally,  it  is  usually  left  to  the  discretion  of  the 
forecaster  to  decide  which  index  to  use  and  which  one  is 
best  representative  for  their  region  or  particular  weather 
regime . 

Unfortunately,  on  particular  days,  certain  indices  may 
indicate  severe  potential,  while  on  others  they  only 
indicate  a  slight  risk  of  CG  lightning  activity.  Both 
conditions  tend  to  occur  with  varied  results.  This  creates 
confusion  as  to  the  utility  of  the  indices  for  the  current 
forecast  location  or  forecast  region  that  is  being  examined. 
Experienced  forecasters  know  that  when  analyzing  the 
forecast  environment  for  the  potential  of  severe  weather, 
the  indices  account  for  a  large  portion  of  the  analysis  and 
are  a  good  starting  point  in  the  formulation  of  their 
forecast.  However,  this  study  shows  that  the  indices 
specify  a  wide  range  of  values  for  both  days  with  CG 
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lightning  activity  and  days  without  any  lightning  activity 
at  all . 

For  very  active  CG  lightning  days,  which  may  or  may  not 
be  associated  with  severe  weather  at  the  surface,  there  is 
thought  to  be  a  noticeable  relationship  to  a  limited  range 
of  unstable  index  values.  The  significance  of  this 
relationship  has  been  the  focus  of  studies  accomplished  on 
stability  indices  in  the  past,  but  never  with  substantial 
justification  (Coleman,  1990) .  This  study  attempts  to 
definitively  assess  some  of  the  most  common  indices  used  in 
operational  weather  forecasting  and,  to  ultimately  develop 
forecast  tools  in  which  these  indices  are  suitable  as 
predictors  of  CG  lighting  activity  for  individual  locations 
or  regions  in  the  Midwest . 

Experienced  forecasters  seem  to  have  their  "favorite" 
stability  index,  but  unfortunately  forecasters  are  unable  to 
determine  which  stability  index  to  rely  upon  the  most  for 
every  weather  regime  being  forecasted  for.  Furthermore, 
even  experienced  forecasters  should  not  rely  totally  on  just 
one  of  the  stability  indices  for  their  forecast  and  may  not 
want  to  even  consider  using  them  at  all  under  certain 
conditions . 

Stability  indices  have  historically  been  used  to  assess 
the  threat  and  potential  severity  of  thunderstorms  (with 
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which  CG  lightning  activity  is  clearly  associated) . 

However,  it  appears  no  previous  studies  have  assessed  the 
degree  to  which  stability  indices  may  be  used  as  predictive 
tools  for  CG  lightning  activity  or  its  intensity  as  provided 
by  the  highly  dependable  and  proven  accuracy  of  the  National 
Lightning  Detection  Network  (NLDN) ,  especially  over  the 
Midwest  region  of  the  United  States . 

The  goal  of  this  research  then,  is  to  ascertain  the 
best  relationship  possible  between  stability  indices  for  use 
as  forecast  tools  in  predicting  any  CG  lightning  or  the 
amount  of  activity  surrounding  upper-air  stations  in  the 
Midwest.  Any  predictive  relationships  found  will  increase 
weather  forecaster’s  confidence  levels  in  their  use  and 
ability  to  predict  CG  lightning  activity.  Any  increase  in 
the  ability  to  accurately  predict  CG  lightning  events  and 
activity  will  be  beneficial  to  all  DoD  and  civil  operations 
affected  by  CG  lightning  activity. 


1.2  Research  Objectives 

In  the  absence  of  adequate  predictive  tools  for 
forecasting  CG  lightning  events,  this  study  examines  the  use 
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of  atmospheric  stability  indices  as  a  means  to  discover 
methods  to  exploit  any  possible  significant  relationships. 
The  specific  tasks  necessary  to  achieve  the  goal  of  this 
study  were : 

1.  to  determine  the  most  useful  radii  (10nm,  25nm,  or 
50nm)  of  CG  lightning  summaries  around  a  station 
to  examine  relationships  with  and  to  combine  the 

i 

most  homogeneous  months  of  lightning  activity  to 
maximize  the  usefulness  of  the  dataset  for  each 
upper-air  location; 

2.  to  analyze  the  stability  indices  and  formulate  an 
improved  range  of  values  to  combine  with  CG 
lightning  occurrences; 

3.  to  examine  the  CG  lightning  data  and  stability 
indices  for  any  predictive  relationships  by  using 
statistical  regression  (linear  and  non-linear) 
techniques ; 

4 .  to  exploit  data  mining  techniques  to  introduce  new 
predictive  techniques  and  to  establish  the  most 
significant  threshold  values  among  the  stability 
indices  using  the  detection  and  classification 
abilities  of  decision  trees;  and, 

5.  to  formulate  a  forecast  matrix  using  any 
predictive  relationships  found  of  the  most 
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significant  features  as  determined  by  the  decision 
tree  results. 
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II .  Background  and  Literature  Review 


2.1  Lightning  Background 

The  lightning  activity  data  used  in  this  study 
indicates  cloud- to-ground  (CG)  strikes  only  within  a  50 
nautical  mile  (nm)  radius  of  each  predefined  upper-air 
station  in  the  Midwest,  as  disseminated  by  the  National 
Lightning  Detection  Network  (NLDN) .  Relationships  were  made 
between  CG  strikes  at  different  radii  (50nm,  25nm,  and  lOnm) 
in  twelve-hour  (12Z  to  00Z  and  00Z  to  12Z)  increments  to 
coincide  with  matching  upper-air  sounding  times  for 
representation.  It  was  quickly  determined  that  the 
lightning  data  at  50nm  was  most  representative  for  lightning 
in  the  general  vicinity  of  a  station.  The  CG  lightning 
strike  summaries  for  the  25nm  and  lOnm  radii  seemed  to 
capture  too  few  occurrences  and  therefore  less  significant 
relationships  could  be  inferred  between  the  indices  and  CG 
strike  activity. 

A  radius  of  50nm  was  chosen  to  represent  the  atmosphere 
around  each  upper-air  station  for  comparison  reasons  and  was 
used  as  the  starting  point  to  assess  any  potential  utility 
of  the  stability  indices  for  predicting  CG  lightning 
activity.  CG  lightning  is  continuously  referred  to  because, 


8 


as  will  be  shown  later,  lightning  activity  data  from  the 
NLDN  consists  of  CG  lightning  strikes  only.  No  intra-cloud 
lightning  measurements  are  inferred  due  to  limitations  in 
sensor  threshold  measurements  (Cummins  et  al . ,  1998). 

There  are  limitations  to  the  lightning  data  used  in 
this  study.  Progress  in  detecting  CG  lightning  strikes  has 
been  well  documented  in  a  recent  publication  by  Cummins  et 
al .  (1998),  who  summarized  the  detection  efficiency  of  the 

network  from  its  past  to  its  present  form.  Prior  to  1992, 
GeoMet  Data  Services  (GDS) ,  the  organization  that  maintained 
the  network  during  that  time,  estimated  that  the  average 
location  accuracy  of  CG  lightning  strike  locations  varied 
from  8  to  16  km  in  the  NLDN.  The  flash  detection  efficiency 
during  this  same  period  was  around  70%,  using  first  stroke 
peak  currents  of  greater  than  5  kiloAmps  (kA) .  Data  model 
estimates  do  not  include  flashes  with  peak  currents  less 
than  5kA  and  are  not  considered  a  CG  flash  because  of  large 
uncertainties  in  the  peak  current  distribution  at  lower 
amperages.  In  early  1992,  GDS  calibrated  the  sensors, 
increasing  the  accuracy  of  the  network  to  4  to  8  km,  with  a 
flash  detection  efficiency  of  65  to  80%.  Once  an  upgrade  in 
1995  was  completed,  the  location  accuracy  improved  to  1  to  2 
km,  with  a  first  stroke  detection  efficiency  of  80  to  90%. 
However,  manual  video  verifications  showed  detection 
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efficiencies  of  84%  prior  to  the  upgrade  in  1994  and  85% 
detection  efficiencies  in  1995  after  the  upgrade. 

Therefore,  significant  ambiguities  between  data  prior  to 
1995  are  not  expected  and  appear  acceptable  (Wacker  and 
Orville ,  1999) . 

Prior  to  the  establishment  of  the  NLDN,  documenting 
thunderstorm  events  was  through  visual  observations  or 
perhaps  radar  and  satellite  information  to  supplement 
detection.  The  most  significant  deficiency  of  this  system 
is  the  timeliness  and  accuracy  of  reporting.  The  NLDN 
alleviates  these  potential  inaccuracies  by  providing 
automated  near  real-time  reporting  of  CG  lightning  data  to 
forecasters.  Since  1991,  upgrades  to  NLDN  sensors  have 
increased  the  accuracy  of  stroke  detection  significantly. 
The  most  recent  upgrade,  completed  in  1995,  reduced  the 
total  number  of  sensors  from  130  to  106  because  of  an 
increase  in  the  effective  range  of  the  existing  sensors 
(Cummins  et  al . ,  1998) .  The  location  accuracy  has  been 
improved  by  a  factor  of  4  to  8  since  1991,  resulting  in  a 
median  location  accuracy  of  approximately  500  meters  at  its 
best.  The  detection  efficiency  increased  from  65-80%  in 
1992-1994  to  80-90%  after  the  1995  upgrade.  This  is 
significant  since  most  stability  indices  and  lightning  data 
used  in  this  study  included  the  1993-1994  period  of  record. 
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However,  there  were  a  few  locations  where  only  data  since 
1995  were  available  (see  Table  1) . 


Table 

1  . 

Data  availability 

for 

each 

location 

used  in 

this 

study. 

WMO 

ICAO 

Location 

State 

Elev 

Lat 

Lon 

Period  of 

(m) 

Record 

72248 

SHV 

SHREVEPORT  REGIONAL 

LA 

79 

32.28 

N 

93.49 

w 

2/95-5/00 

72249 

FWD 

FORT  WORTH 

TX 

196 

32.50 

N 

97 . 18 

w 

7/94-5/00 

72340 

LZK 

NORTH  LITTILE  ROCK 

AR 

165 

34.50 

N 

92 . 15 

w 

1/93-5/00 

*72355 

FSI 

FORT  SILL  (Military) 

OK 

362 

34.39 

N 

98.24 

w 

1/93-5/00 

72357 

OUN 

NORMAN/ WESTHEIMER 

OK 

357 

35.13 

N 

97.27 

w 

1/93-5/00 

**72363 

AMA 

AMARILLO  ARPT(AWOS) 

TX 

1099 

35.14 

N 

101.42 

w 

1/93-5/00 

72440 

SGF 

SPRINGFLD  MUNI(AWS) 

MO 

387 

37.14 

N 

93.23 

w 

5/95-5/00 

72451 

DDC 

DODGE  CITY (AWOS ) 

KS 

790 

37.46 

N 

99 . 58 

w 

1/93-5/00 

72456 

TOP 

TOPEKA/BILLARD  MUNI 

KS 

270 

39.04 

N 

95 .37 

w 

5/95-5/00 

72558 

OAX 

OMAHA/VALLEY 

NE 

350 

41.19 

N 

96.22 

w 

7/94-5/00 

72562 

LBF 

N.  PLATTIE/LEE  BIRD 

NE 

849 

41.08 

N 

100.41 

w 

1/93-5/00 

72662 

RAP 

RAPID  CTY  RGNL  ARPT 

SD 

964 

44.05 

N 

103 . 03 

w 

1/93-5/00 

74455 

DVN 

DAVENPORT  UPPER-AIR 

IA 

229 

41.37 

N 

90.35 

w 

3/95-5/00 

*  12 Z  sounding  only 

**  SWEAT  index  missing  11/98-5/00 


2.2  Stability  Index  Background 

Weather  balloons  attached  to  their  Styrofoam-boxed 
instrumentation  called  rawindsondes  have  been  used  to  gather 
atmospheric  measurements  of  the  vertical  temperature, 
moisture,  and  wind  profiles  (soundings)  above  a  location 
since  the  early  1900s.  Rawindsondes  have  been  the 
foundation  of  the  global  upper-air  observing  system  with 
more  than  1,000  rawindsonde  stations  operated  by  92 


11 


countries  as  of  the  early  1990s  (NOAA,  1992) .  Most  of  these 
upper-air  stations  in  the  United  States  launch  weather 
balloons  twice  a  day,  once  at  00Z  (Universal  Time 
Coordinated  (UTC)  or  Greenwich  Meridian  Time  (GMT)  and  again 
at  12Z.  Across  the  continental  United  States,  weather 
balloons  are  launched  from  over  100  different  locations, 
from  which  many  various  calculations  are  made  from  the 
environmental  data  gathered.  These  range  from  the  complex 
analysis/forecast  models  developed  by  weather  organizations 
to  the  derived  stability  indices  used  in  this  research 
effort.  Due  to  the  inaccuracy,  at  times,  of  these  weather 
models,  research  to  improve  them  is  a  continuous  effort. 
Stability  indices,  then,  are  an  essential  part  of  the 
analysis/forecast  process  (especially  for  convective  weather 
forecasting)  and  are  used  in  combination  with  the 
analysis/forecast  models  to  determine  the  current  and 
forecasted  states  of  the  ever-changing  atmosphere.  Thirteen 
sounding  locations  in  the  Midwest  were  chosen  for  this  study 
from  the  various  government  and  military  sounding  sites 
indicated  in  Figure  1.  While  the  results  of  all  13 
locations  are  presented,  this  study  focuses  on  two  of  the 
sites,  which  are  deemed  representative  of  the  entire 
regional  climate  regime.  One  in  Oklahoma,  a  National 
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Weather  Service  (NWS)  sounding  site,  is  Norman  (OUN)  and  the 
other,  in  west-central  Nebraska,  is  North  Platte  (LBF) . 


Figure  1.  United  States  upper  air  stations  along  with  their 
corresponding  ICAO  (International  Civil  Aviation 
Organization)  identifiers.  The  Midwest  sounding 
sites  included  in  this  study  are  circled. 


2.3  Atmospheric  Sounding  Data  Reliability 

The  soundings,  derived  from  the  rawindsondes  discussed 
previously,  refer  to  a  profile  of  vertical  distribution 
(from  a  single  location)  of  the  pressure,  temperature,  dew 
point  temperature,  wind  direction,  and  wind  speed  from 
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measurements  taken  by  a  rawindsonde  as  it  traverses  upward 
near  the  site  where  the  balloon  was  launched.  Depending  on 
the  strength  of  the  winds  aloft  though,  the  information 
gathered  is  usually  not  representative  of  the  atmosphere 
immediately  over  the  launch  site.  It  would  be  ideal  to  have 
an  exact  replication  of  the  current  state  of  the  atmosphere 
directly  above  each  measurement  location.  However,  because 
strong  winds  aloft  blow  the  balloon  a  considerable  distance 
downstream  from  where  it  was  released,  this  is  usually  not 
the  case.  The  measurements  though  must  be  considered 
representative  of  the  sounding  location,  because  no  location 
error  corrections  are  made  to  rawindsonde  observations 
(Andra,  2000)  .  The  location  errors  are  especially  large 
when  there  are  strong  upper- level  winds  blowing  the  sounding 
balloon  further  away  from  the  launch  site  as  it  rises  into 
the  atmosphere.  This  makes  the  data  even  less 
representative  of  the  location  from  which  it  originated. 
Fortunately,  most  of  the  indices  calculated  for  this  study 
compute  temperature  and  moisture  measurements  from  the  850 
and  500  millibar  (mb)  pressure  levels,  which  equate  to 
roughly  3,000  to  18,000  feet,  respectively,  in  the  standard 
atmosphere.  Rawindsondes  typically  take  measurements  well 
above  300mb  (over  30,000  feet),  where  location  errors  can  be 
quite  large.  For  purposes  of  this  study,  it  is  assumed,  as 
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it  is  for  the  national  rawindsonde  network,  that  these 
errors  are  minimal  and  thus  are  not  considered  significant, 
especially  in  a  data  dense  region  such  as  the  Midwest  with 
diminutive  terrain  complexity. 


2.4  Stability  indices  as  Predictors 

There  are  numerous  other  severe  weather  indices  in  use, 
many  of  which  are  used  at  the  National  Severe  Storms 
Forecast  Center  by  forecasters  who  specialize  in  severe 
weather  forecasting.  The  indices  presented  in  this  research 
effort  are  those  routinely  used  by  forecasters  to  evaluate 
the  stability  or  instability  of  the  atmosphere.  The 
stability  indices  can  be  thought  of  as  the  analyzed 
convective  potential  of  a  sounding  expressed  as  a  single 
numerical  value.  Miller  et  al .  (1972)  developed  the 

generally  accepted  stability  index  thresholds  for  the 
Midwest  that  were  used  in  this  study.  The  stability  indices 
were  further  classified  into  the  threshold  categories  listed 
in  Table  2.  From  these  categories,  an  improved  range  of 
values  for  the  occurrence  of  CG  lightning  is  suggested  in 
the  next  chapter.  But  first  calculations  of  each  stability 
index  are  discussed. 
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Table  2 .  Suggested  range  of  index  values  as  general 
thunderstorm  indicators  (AFWA,  1998) . 


Index 

REGION  best  applied 

Weak  (Low) 

Moderate 

Strong  (High  risk) 

CAPE 

East  of  Rockies 

300  to  1000 

1000  to  2500 

2500  to  5300 

K- Index 

East  of  Rockies 

in  moist  air 

20  to  26 

26  to  35 

>  35 

K0- Index 

Cool,  moist 
climates  (Pacific 

>  6 

2  to  6 

<  2 

Lifted  Index 

All 

0  to  2 

-3  to  -5 

<  -5 

Showalter 

CONUS 

>  3 

2  to  -2 

<  -3 

Total  Totals 

East  of  Rockies 

44  to  45 

46  to  48 

>  48 

SWEAT  (for 
Severe) 

Midwest  and 

Plains 

<  275 

275  to  300 

>  300 

2.5  Stability  Index  Calculations 

Convective  Available  Potential  Energy  (CAPE) - 

CAPE  is  a  measure  of  the  amount  of  buoyant  energy 
available  to  accelerate  a  parcel  of  air  vertically.  CAPE  is 
directly  related  to  the  maximum  potential  vertical  speed 
within  an  updraft  or  a  summation  of  the  amount  of  buoyancy 
(not  accounting  for  drag  or  non-adiabatic  effects) .  Higher 
values  indicate  greater  potential  for  severe  weather. 
Observed  values  in  thunderstorm  environments  often  exceed 
1,000  joules  per  kilogram  (J/kg) ,  and  in  extreme  cases  may 
exceed  5,000  J/kg.  However,  as  with  the  other  indices,  a 
wide  range  of  values  are  associated  with  a  wide  range  of 
weather  phenomena,  notwithstanding,  lightning  activity. 

CAPE  is  represented  on  a  skew-T  log-P  diagram  as  the  area  of 
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energy  enclosed  between  the  environmental  (sounding)  lapse 
rate  and  the  parcel  derived  lapse  rate  from  the  LFC  (Level 
of  Free  Convection)  to  the  EL  (Equilibrium  Level)  (AWS  TR- 
79/006,  1979) .  This  area,  often  called  the  positive  area, 
is  directly  related  to  positive  buoyancy.  This  positive 
area  represents  the  maximum  potential  strength  of  updrafts 
within  a  thunderstorm,  should  one  develop. 

CAPE  values  of  greater  than  1,500  J/kg,  dependent  upon 
location  and  season,  represent  enough  energy  to  produce 
thunderstorms.  A  value  greater  than  3,000  J/kg  represents 
enough  energy  to  produce  strong  thunderstorms.  Negative 
values  of  CAPE  denote  a  relatively  stable  atmosphere  and  are 
referred  to  as  Convective  Inhibition  (CIN) ,  which  is 
computed  as  the  negative  area  on  the  sounding,  if  it  exists 
(AWS  TR-79/006,  1979) .  CIN  was  not  computed  for  this  study. 
Knowledge  of  a  CAPE  profile  or  the  shape  of  a  sounding  also 
has  some  implications,  but  was  not  considered.  For  example, 
two  soundings  may  have  the  same  CAPE  values  but  different 
profile  shapes  (South  African  Weather  Bureau,  2000)  .  This 
study  therefore  utilizes  the  positive  values  of  CAPE  in 
comparisons  with  CG  lightning  activity. 
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Showalter  Stability  Index  (SSI) - 

The  SSI  (Showalter,  1953)  is  a  measure  of  the  potential 
instability  in  the  850mb  to  500mb  layer.  The  SSI  may  be 
unrepresentative  if  significant  amounts  of  moisture  reside 
below  850mb  with  dry  air  residing  above.  In  this  case  the 
SSI  would  not  be  able  to  detect  the  resulting  instability. 
SSI  is  the  stability  index  most  commonly  used  by  military 
and  other  forecasters.  It  indicates  the  general  stability 
of  an  air  mass  but  should  not  be  used  when  a  frontal 
boundary  or  a  strong  inversion  is  present  between  the  850mb 
and  500mb  levels.  SSI  is  computed  using  the  layer  between 
850mb  and  500mb  as  follows: 

SSI  =  T500  -  TP500  (1) 


where, 

•  T500  =  the  measured  temperature  in  degrees  Celsius  at 
5  00mb 

•  TP500  =  temperature  in  degrees  Celsius  of  an  air  parcel 
lifted  moist  adiabatically  from  the  850mb  lifted 
condensation  level  to  500mb 


Lifted  Index  (LI)  - 

The  LI  (Galway,  1956)  is  a  measure  of  the  potential 
instability  from  the  surface  to  the  500mb  level.  It  is  very 
similar  to  the  SSI,  but  instead  of  using  the  arbitrary 
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choice  of  the  850mb  level,  it  is  usually  computed  by  lifting 
a  parcel  with  an  average  mixing  ratio  along  the  dry  adiabat 
in  the  lowest  3,000  feet  of  the  sounding  using  the  mean 
mixing  ratio  by  equal  area  averaging  to  better  consider  the 
available  low-level  moisture  below  the  850mb  level.  There 
are  various  methods  used  to  determine  the  initial  level . 

Some  methods  use  the  maximum  forecasted  afternoon 
temperature  or  the  mean  sounding  temperature  in  the  lower 
levels  if  significant  heating  or  cooling  is  not  expected 
during  the  afternoon.  The  algorithm  used  at  the  Air  Force 
Combat  Climatology  Center  (AFCCC)  uses  the  average  mixing 
ratio  in  the  lower  3,000  feet  to  compute  the  LI  for  this 
study. 

A  common  measure  of  atmospheric  instability,  the  LI  is 
obtained  by  computing  the  temperature  that  air  near  the 
ground  would  have  if  it  were  lifted  to  500mb  (approximately 
18,000  feet  for  the  standard  atmosphere)  and  comparing  that 
temperature  to  the  actual  temperature  at  that  level . 

Positive  values  reflect  stable  conditions  while  negative 
values  reflect  unstable  conditions  (the  parcel  is  warmer 
than  its  environment  so  it  will  continue  to  rise.  It  is 
computed  as  follows: 

LI  =  T (500mb  environment)  -  T(500mb  parcel)  (2) 
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The  LI  is  measured  in  degrees  C,  where  "T(500mb 
environment)"  represents  the  500mb  environmental  temperature 
and  "T(500mb  parcel)"  is  the  rising  air  parcel's  500mb 
temperature.  If  the  lifted  air  parcel  is  warmer  than  its 
surrounding  environmental  temperature  then  it  should 
continue  to  rise.  Thus,  negative  values  indicate 
instability  and  the  more  negative,  the  more  unstable  the  air 
is,  and  the  stronger  the  updrafts  are  likely  to  be  with  any 
developing  thunderstorm (s) . 


Total  Totals  Index  (TTI)- 

The  TTI  (Miller,  1972)  consists  of  two  components: 
Vertical  Totals  (VT)  and  Cross  Totals  (CT) .  VT  represents 
static  stability  between  the  850mb  and  500mb  levels  while 
the  CT  includes  a  moisture  parameter,  the  850mb  dew  point 
temperature.  As  a  result,  TTI  accounts  for  both  static 
stability  and  850mb  moisture  amounts.  However,  TTI  can  be 
illusory  in  situations  where  the  low-level  moisture  may 
reside  below  the  850mb  level.  For  example,  if  a  significant 
capping  inversion  is  present,  convection  may  be  inhibited 
even  when  TTI  values  are  strong. 

TTI,  like  SWEAT  (described  next),  is  actually  a 
compound  index  designed  to  better  predict  the  occurrence  of 
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severe  weather,  not  just  general  thunderstorms.  In  other 
words,  it  was  developed  for  use  when  such  indices  as  SSI  or 
LI  indicate  that  thunderstorms  may  occur.  However,  this 
index  is  another  more  commonly  derived  index  that  many 
novice  weather  experts  may  assess  equally  along  with  SSI  and 
LI  to  determine  relative  instability.  This  is  why  all  the 
commonly  derived  indices  were  used  in  this  study. 
Additionally,  it  is  unknown  whether  any  of  these  indices 
have  a  predictive  relationship  to  CG  lightning  strikes 
within  50nm  of  a  station.  It  appears  that  the  predictive 
potential  of  the  indices  to  CG  lightning  activity  within  a 
specified  radius  of  a  sounding  location  has  never  been 
assessed.  It  will  be  seen  later  that  in  fact  some  of  the 
indices  developed  specifically  for  severe  weather  indication 
appear  to  correlate  well  to  CG  lightning  counts.  TTI  is 
computed  as  follows: 

TTI  =  (T850  -  T500)  +  (D850  -  T500)  (3) 

To  calculate  the  TTI,  two  values  are  computed  from  the 
sounding:  the  vertical  totals  (VT)  and  the  cross  totals 

(CT) .  VT  is  a  measure  of  the  vertical  stability  without 
regard  for  moisture  parameters  and  is  computed  by 
subtracting  the  500mb  temperature  (T500)  from  the  850mb 
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temperature  (T850) .  CT  is  a  measure  of  stability  that 
includes  moisture  and  is  found  by  subtracting  T500  from  the 
850mb  dew  point  temperature  (D850) .  The  Total  Totals  (TTI) 
index  is  simply  the  sum  of  VT  and  CT.  Forecasters  evaluate 
thunderstorm  potential  according  to  the  general  guidelines 
provided  by  Miller  (1972) .  The  TTI  index  is  the  most 
reliable  single  predictor  of  severe  activity  for  both  warm 
and  cold  seasons.  During  1964  and  1965,  92  percent  of  all 
reported  tornadoes  occurred  with  a  TTI  of  50  or  greater. 

Most  widespread  tornado  outbreaks  occurred  with  a  TTI  of  55 
or  greater  (Miller  et  al . ,  1972) .  High  values  of  TTI  can 
result  with  insufficient  low-level  moisture  (determined  by 
CT) ,  which  is  required  for  convective  activity.  In  other 
words,  a  low  CT  combined  with  extremely  high  VT  values  can 
suggest  misleading  TTI  values.  This  is  another  reason  to 
integrate  other  indices  into  a  forecasters  "convective 
potential  equation" .  Other  indices  account  for  various 
other  temperature  and  moisture  parameters  that  the  TTI  may 
miss  with  its  single  consideration  for  moisture  at  the  850mb 
level . 

TTI  must  be  used  with  careful  attention  to  either  the 
CT  value  or  the  actual  low-level  moisture  amounts,  since  it 
is  possible  to  have  a  large  TTI  value  with  insufficient  low- 
level  moisture  to  support  thunderstorms. 
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Severe  Weather  Threat  Index  (SWEAT) - 


The  SWEAT  Index  (Miller  et  al . ,  1972)  evaluates  the 
potential  for  severe  weather  by  examining  both  kinematics 
(wind)  and  thermodynamic  information  into  one  index.  It  is 
one  of  the  more  complex  indices  derived  in  this  study, 
resulting  in  this  index  as  having  one  of  the  highest  missing 
data  rates.  The  algorithm  used  by  AFCCC  to  compute  this 
index  requires  wind  parameter  measurements  at  specified 
height  levels.  If  any  of  these  required  measurements  are 
missing,  then  the  index  cannot  be  calculated.  These 
parameters  include  low-level  moisture  (850mb  dew  point) , 
instability  (via  TTI) ,  lower  and  middle-level  (850  and 
500mb)  wind  speeds,  and  warm  air  advection  (veering  between 
850  and  500mb) .  Unlike  KI,  the  SWEAT  index  was  originally 
developed  to  assess  severe  weather  potential,  not  just 
ordinary  thunderstorm  potential. 


SWEAT= 

(12*850Td)  +  (2  0* [TTI -4 9]  )  +  (2*f850) +f5  00+  (12  5*  [s  +  0 .2] )  (4) 

where , 

o  850Td  is  the  dew  point  temperature  at  850mb, 
o  TTI  is  the  total -totals  index, 
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o 


f850  is  the  850-mb  wind  speed  (in  knots) , 


o  f500  is  the  500-mb  wind  speed  (in  knots) ,  and 

o  s  is  the  sine  of  the  angle  between  the  wind 
directions  at  the  500rab  and  850mb  levels  (thus 
representing  the  directional  shear  in  this  layer) 
which  equates  to  the  amount  of  warm  air  advection 
between  the  layers. 

The  last  term  in  the  equation  (the  shear  term)  is  set  to 
zero  if  any  of  the  following  criteria  are  not  met: 

1)  850mb  wind  direction  ranges  from  130  to  250  degrees, 

2)  500mb  wind  direction  ranges  from  210  to  310  degrees, 

3)  500mb  wind  direction  minus  the  850mb  wind  direction  is  a 

positive  number,  and 

4)  both  the  850  and  500mb  wind  speeds  are  at  least  15  knots. 
No  term  in  the  equation  may  be  negative;  if  so,  that  term  is 
set  to  zero. 

Guidance  values  developed  by  the  Air  Weather  Service 
suggest  severe  storms  may  be  possible  for  SWEAT  values  of 
250-300  if  strong  lifting  is  present.  In  addition, 
tornadoes  may  occur  with  SWEAT  values  below  the  400mb  level, 
especially  if  convective  cell  and  boundary  interactions 
increase  the  local  shear,  which  cannot  be  resolved  in  this 
index.  The  SWEAT  value  can  increase  significantly  during 
the  day,  so  low  values  based  on  12Z  soundings  may  be 
unrepresentative  if  substantial  changes  in  moisture, 
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stability,  and/or  wind  shear  occur  during  the  day.  SWEAT 
values  of  about  250-300  indicate  a  greater  potential  for 
significant  thunderstorms,  but  as  with  many  of  the  stability 
indices,  there  are  no  significant  "magical"  thresholds 
developed  for  CG  lightning  activity. 

K-Index  (KI) - 

The  K  index  (George,  1960)  or  K  Value  is  a  measure  of 
thunderstorm  potential  based  on  the  vertical  temperature 
lapse  rate  along  with  the  amount  and  vertical  extent  of  low- 
level  moisture  in  the  atmosphere.  The  KI  is  computed  as 
follows : 


KI  =  T850  +  D850  -  T700  +  D700  -  T500  (5) 

KI  is  a  measure  of  thunderstorm  potential  based  on  the 
temperature  lapse  rate,  the  moisture  content  of  the  lower 
atmosphere,  and  the  vertical  extent  of  the  moist  layer.  It 
should  be  used  to  analyze  the  potential  for  air  mass 
thunderstorm  occurrence— not  potential  occurrences  of  frontal 
thunderstorms  and  not  for  the  potential  severity  of  a 
thunderstorm.  The  temperature  difference  between  the  850mb 
and  500mb  heights  is  the  parameter  used  to  find  the  vertical 
lapse  rate,  and  the  850mb  dew  point  and  the  700mb  dew  point 
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depression  are  used  to  evaluate  the  moisture  content  of  the 
air,  as  well  as  the  vertical  extent  of  the  moist  layer. 

As  was  mentioned  earlier,  each  index  has  its  own 
advantages  and  disadvantages.  The  main  weaknesses  depend 
primarily  upon  the  levels  of  analyses  used  for  each  index  or 
the  shape  of  the  atmospheric  profile. 
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III.  Data  Collection  and  Review 


It  is  important  to  appreciate  the  history,  background, 
and  potential  for  weaknesses  of  the  data  used  in  this  study. 
Basically,  there  are  two  separate  sources  of  data,  lightning 
summary  output  derived  from  the  NLDN,  and  stability  indices 
derived  from  the  moisture,  wind,  and  temperature  profiles  of 
upper  air  soundings.  Each  stability  index  is  a  measure  of 
the  potential  instability  of  the  atmosphere  by  an 
examination  of  the  different  combinations  of  temperature  and 
moisture  at  pre-determined  pressure  levels  (or  heights) . 

A  more  formal  definition  of  a  stability  index  is:  the 
analyzed  convective  potential  of  an  upper-air  sounding 
expressed  as  a  single  numerical  value.  The  importance  of 
the  stability  indices  was  pointed  out  by  a  study  conducted 
by  Air  Weather  Service  and  the  National  Severe  Storms 
Forecast  Center  of  the  National  Weather  Service  (Miller, 

1972) .  This  survey  used  328  tornado  cases  to  determine 
which  atmospheric  conditions  were  necessary  for  the 
development  of  severe  weather.  The  parameters  were  ranked 
in  order  of  importance  based  on  both  computer  analysis  and 
forecasting  experience.  An  analogy  is  drawn  to  this 
approach  later  in  this  study  using  data  mining  techniques. 
Results  showed  that  the  second  most  influential  parameter 
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for  convective  forecasting  is  the  stability  of  the 
atmosphere  itself,  upon  which  the  applications  of  the 
stability  indices  are  based.  Additional  stability  indices 
have  been  developed  since  the  original  study  (Miller,  1972) 
and  are  considered  in  this  study  as  well .  Each  index  takes 
different  atmospheric  parameters  into  consideration.  Not 
considered,  but  readily  available  from  upper-air  soundings, 
are  the  wind  field  structures  and  the  indices  derived  from 
them,  such  as  helicity  or  upper-level  flow. 


3.1  Data  Methods 

In  an  attempt  to  improve  weather  forecasts  for  cloud- 
to-ground  (CG)  lightning  activity,  which  is  inherently 
related  to  thunderstorm  convection,  stability  indices  and  CG 
lightning  relationships  were  examined  for  13  different  upper 
air  stations  in  the  Midwest.  Again,  no  inferences  are  made 
at  this  point  between  severe  or  non- severe  types  of 
convection.  An  exhaustive  effort  was  made  to  utilize  a 
large  sample  database  of  upper  air  soundings  in  which  the 
indices  are  derived  for  each  location  along  with  highly 
accurate  CG  lightning  summaries  from  the  NLDN  between  1993 
through  2000.  Relationships  were  made  between  CG  strikes  at 
different  radii  (50nm,  25nm,  and  lOnm)  in  twelve-hour 
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increments  (12Z  to  00Z  and  00Z  to  12Z)  increments  to 
coincide  with  matching  sounding  times  for  representation. 

It  was  quickly  determined  that  the  lightning  data  for  50nm 
model  was  the  most  representative  for  an  area  around  each 
station.  The  CG  lightning  strike  summaries  for  the  25nm 
and  lOnm  radii  seemed  to  capture  too  few  occurrences  and 
therefore  no  relationships  were  inferred  between  these 
indices  and  CG  strike  activity. 

The  average  horizontal  spacing  of  the  upper-air 
sounding  network  in  the  Midwest  is  approximately  200nm. 
However,  the  summer  environment  in  the  Midwest  is  often 
characterized  by  shower-producing  systems  that  occur  on 
smaller  spatial  scales.  These  may  be  missed  by  the  50nm 
radius  used  in  this  study  for  CG  lightning  strike 
comparisons,  yet  may  still  be  representative  of  the  general 
area  around  the  sounding.  This  fact  could  potentially 
degrade  the  significance  of  any  results  since  many  CG 
lightning  strikes  may  be  missed. 

When  one  looks  at  the  time  scales  of  convective 
activity  in  this  region  during  the  summer  months,  many  times 
they  are  on  the  order  of  only  a  few  hours.  Thus,  in  a 
rapidly  changing  environment,  such  as  storms  triggered  by  a 
fast  moving  frontal  system,  these  sounding  analyses  may  miss 
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key  moisture  and  temperature  changes  affecting  the  stability 
of  the  atmosphere . 

Some  potential  limitations  to  this  approach,  based  on 
past  research,  can  be  anticipated.  The  goal  in  this  study 
is  to  predicate  past  studies  that  utilized  the  stability 
indices  but  had  proven  inconclusive  (Huntrieser  et  al . , 

1996) .  Alternatively,  a  study  was  conducted  in  the  High 
Plains  with  a  high-resolution  mesonet  with  25-50km  spacing 
and  twice  the  number  of  sounding  observations  than  are 
normally  available  on  a  day-to-day  basis  (Mueller  et  al . , 
1993)  . 

The  results  indicated  that,  much  to  their  dismay,  a 
high  resolution  of  timely  mesonet  upper-air  soundings 
provided  no  further  skill  of  the  soundings  to  predict 
convective  weather  outbreaks.  Therefore,  their  conclusion 
was  that  the  existing  sounding  observation  resolution  should 
be  adequate  for  research  purposes.  This  point  is  very 
important,  since  it  helps  further  justify  the  available 
sounding  databases  used  in  this  study . 

Parameters  such  as  helicity,  streamwise  vorticity,  and 
hodographs  have  proven  results  when  combined  with 
atmospheric  stability  indices  but  require  a  bit  more 
detailed  analysis  then  was  able  to  be  considered  in  this 
study  (AWS  TR  79-006,  1990) . 
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3.2  Data  Sources 


The  Air  Force  Combat  Climatology  Center  {AFCCC)  located 
in  Asheville,  NC  provided  the  upper-air  stability  indices 
and  the  NLDN  summary  data  used  in  this  study.  AFCCC  has  an 
extensive  database  of  NLDN  lightning  data  and  raw  upper  air 
data  for  the  Midwest  with  the  ability  to  provide  the  data  in 
many  different  formats. 

Lightning  summary  data  for  CG  strikes  within  a  50nm 
radius  of  each  location  for  each  12  hour  period  were 
calculated  using  archived  NLDN  data  and  ArcView  GIS  mapping 
applications.  Once  the  raw  data  was  formatted,  ArcView 
easily  determined  the  daily  counts.  Typically  it  took  one 
day  per  location  to  complete  the  summaries. 

The  stability  indices  utilized  in  this  study  were 
computed  using  archived  upper-air  sounding  data  ingested  by 
FORTRAN  algorithms  developed  at  AFCCC.  These  were  much 
easier  to  compute  than  the  lightning  summaries. 
Unfortunately,  algorithms  were  not  available  for  every 
stability  index  and  time  constraints  prohibited  the 
development  of  new  algorithms  to  create  any  additional 
indices  needed.  Suggested  applications  of  other  indices  not 
considered  in  this  study  are  recommended  for  future 
consideration  in  the  last  chapter  of  this  study. 
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IV.  Methods  of  Data  Analysis 

4.1  Analysis  of  stability  indices  in  deciphering 
homogeneous  datasets 

Box  plots  and  histograms  of  the  indices  and  CG 
lightning  data  were  constructed  for  each  location  with 
results  displayed  for  LBF  (North  Platte,  NE)  and  OUN 
(Norman,  OK) .  A  more  thorough  attempt  was  made  to  analyze 
the  datasets  for  display  at  LBF  and  OUN  because  of  their 
large  available  datasets  (1993-2000)  and  representative 
locations  in  the  northern  and  southern  portions  of  the 
study  (Figure  2) . 


Figure  2.  Locations  used  for  this  study  with  emphasis  on 
LBF  and  OUN. 


Determining  which  months  are  homogeneous  in  this  study 
includes  a  month-by-month  assessment  of  the  available  data. 
It  was  ascertained  that  certain  potentially  unreliable  non- 
homogeneous  datasets  should  be  eliminated  and  the  rest 
combined.  Combining  the  significant  months  helped  maximize 
the  database  for  each  location. 

In  Figures  3-6,  not  surprisingly,  a  noticeable  peak  in 
the  summer  months  for  all  locations  toward  more  unstable 
values  of  selected  indices  is  evident  for  both  sounding 
times  (00Z  and  12Z) .  The  summer  months  from  May  to 
September  (5-9)  project  the  peak  instabilities  the  most. 
Note  that  only  positive  values  for  CAPE  are  used. 

There  is  also  a  noticeable  increase  in  the  variability 
(range  of  values)  between  00Z  and  12 Z  of  the  indices  shown 
in  Figures  3-6.  It  is  clearly  evident  the  effects  that 
morning  inversions  have  on  12Z  sounding  times  during  the 
"active"  season.  The  12Z  CAPE  calculations  are  especially 
variable  because  of  the  way  it  is  calculated  (integration 
through  the  atmosphere  which  can  be  concealed  at  inversion 
levels) .  It  is  determined  shortly  that  a  more  unstable 
range  of  values  for  most  indices  is  required  for  the  12Z 
soundings  to  be  associated  with  CG  lightning  activity.  It 
is  not  surprising  that  the  afternoon  00Z  soundings  appear 
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to  be  most  representative  when  convection  is  possible  or 
expected . 

Mueller  et  al .  (1993)  determined  that  forecasted 

afternoon  soundings  correlated  best  to  thunderstorm 
activity.  In  their  study,  the  12Z  forecast  soundings 
performed  better  than  the  00Z  soundings.  Forecast 
soundings  were  considered  in  this  study  but  forecast  upper- 
air  model  data  are  not  archived  in  any  known  data  center 
and  time  limitations  prohibited  their  development.  Under 
that  rationale,  forecast  soundings  for  each  location  are 
suggested  for  future  research. 
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4.2  Analysis  of  Cloud- to -Ground  Lightning  Activity  in 

Deciphering  Homogeneous  Datasets . 

Bar  plots  of  mean  monthly  CG  lightning  activity  also 
show  a  distinct  peak  during  months  5-9  for  00Z  and  12Z  at 
both  locations  (Figures  7  and  8) .  Bar  plots  for  the  total 
number  of  days  with  any  CG  lightning  activity  (labeled 
CG_COUNTS)  were  also  constructed  in  Figures  9  and  10.  It 
may  be  arguable  whether  to  include  months  4  or  10  at  12 Z 
for  OUN  as  the  active  lightning  months,  but  LBF  definitely 
supports  the  hypothesis  that  months  5-9  are  the  most  active 
for  CG  lightning  activity  and  would  exhibit  the  most 
homogeneity  for  all  locations. 

4.3  Maximizing  the  Datasets 

To  optimize  the  usefulness  of  the  database  for  each 
location,  an  effort  was  made  to  see  if  it  would  be 
reasonable  to  merge  specific  "active"  months  together  for 
analysis  purposes.  The  indices  and  CG  lightning  summaries 
obviously  show  a  significant  variability  by  season. 

CG  lightning  counts  are  significantly  lower  or  non-existent 
and  index  values  indicate  much  higher  variability  for  the 
"cool"  or  inactive  months  from  October  to  April. 

Conversely,  significant  peaks  in  CG  lightning  counts  and 
less  variability  for  most  indices  existed  for  months  5-9. 


37 


Accordingly,  to  further  optimize  the  usefulness  of  the 


dataset  for  each  location,  it  was  decided  to  combine  the 


more  homogeneous  data  set  of  just  the  warm  "most  active" 


months  (5-9)  for  determining  underlying  threshold  values  of 


the  indices  to  CG  lightning  activity. 
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Figure  7.  Mean  CG  count  by  time/month  for  LBF. 


38 


OUN  (Norman,  OK) 
1993-  May  2000 


c 

CD 

> 

CD 


0 

o 

I 

1- 

z 

=> 

o 

o 


120 

100 

80 

60 

40 

20 

0 


HOUR 

fflooz 

I012Z 


MONTH 

Figure  10.  Total  CG  lightning  days  within  a  50nm  radius 

of  OUN. 


4.4  Developing  a  Baseline  Climatology  of  Stability  Index 
Values  for  Predicting  CG  Lightning  Activity. 

In  their  article  "A  baseline  climatology  of  sounding- 
derived  supercell  and  tornado  forecast  parameters",  E. 
Rassmussen  and  D.  Blanchard  discuss  the  need  for  a  baseline 
climatology  of  sounding  threshold  values  to  weather  events 
in  support  of  operational  thunderstorm  forecasts.  Their 
study  concentrated  primarily  on  the  climatology  of  CAPE  and 
other  more  dynamic  weather  parameters  to  severe 
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thunderstorm  and  tornado  occurrences.  The  question  that 
they  felt  needed  to  be  answered  was  "at  what  values  or 
thresholds  of  stability  indices  should  forecasters  become 
concerned  about  thunderstorm  potential?"  Since  the  CG 
lightning  summaries  used  in  this  study  are  obviously 
related  to  thunderstorm  occurrences,  this  study,  with  its 
exhaustive  climatological  database  of  indices,  should  be 
able  to  potentially  answer  their  question.  In  particular, 
to  determine  threshold  values  for  individual  locations  and, 
if  a  relationship  exists,  for  a  forecast  region  comprised 
of  the  upper-air  locations  in  Figure  1. 

Weather  forecasters  need  to  know  a  climatological 
range  of  values  of  each  stability  index  to  days  with  any  CG 
lightning  activity.  Up  until  now  it  appears  an  exhaustive 
study  of  the  predictability  of  the  indices  to  NLDN 
lightning  summaries  has  yet  to  be  made.  A  suggested 
threshold  range  of  values  by  region  was  made  for 
thunderstorms  by  a  recent  publication  from  the  Air  Force 
Weather  Agency  (AFWA)  titled  "Meteorological  Techniques" 
(AFWA  TN-98/002,  1998).  This  AFWA  Technical  Note  suggests 
a  range  of  values  for  the  indices  used  in  this  study  and 
categorizes  the  range  of  index  values  into  general 
thunderstorm,  severe  thunderstorm,  or  as  tornado  indicators 
(see  Table  3) .  For  the  purpose  of  this  study,  general 
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thunderstorm  occurrence  or  any  occurrence  for  that  matter 
is  applicable  since  it  attempts  to  predict  any  amount  of  CG 
lightning  activity.  For  this  study,  it  should  be  noted 
that  no  inference  is  made  to  the  severity  of  each 
thunderstorm,  perhaps  for  a  future  study.  However,  an 
inference  is  made  as  to  the  potential  amount  of  CG 
lightning  expected  later  in  chapter  5  on  regression  trees. 


Table  3 .  Suggested  range  of  index  values  as  general 

thunderstorm  indicators  (AFWA  TN-98/002,  1998). 


Index 

REGION  best  applied 

Weak  (Low) 

Moderate 

Strong  (High  risk) 

CAPE 

East  of  Rockies 

300  to  1000 

1000  to  2500 

2500  to  5300 

K- Index 

East  of  Rockies 
in  moist  air 

20  to  26 

26  to  35 

>  35 

K0- Index 

Cool,  moist 
climates  {Pacific 

>  6 

2  to  6 

<  2 

Lifted  Index 

All 

0  to  2 

-3  to  -5 

<  -5 

Showalter 

CONUS 

>  3 

2  to  -2 

<  -3 

Total  Totals 

East  of  Rockies 

44  to  45 

46  to  48 

>  48 

SWEAT  {for 
Severe) 

Midwest  and 

Plains 

<  275 

275  to  300 

>  300 

The  statistical  software  package  SPSS  (version  10) , 
allows  the  user  to  select  a  range  of  values  for  each  index, 
permitting  an  easy  assessment  of  the  merit  of  the  suggested 
range  of  values  from  Table  3  for  each  index  category  (Weak, 
Moderate,  Strong) .  One  must  keep  in  mind  that  weak, 
moderate,  and  strong  in  this  case  represents  an 
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"indication"  for  general  thunderstorms  without  reference  to 
their  severity. 

These  threshold  ranges  for  each  stability  index  are 
used  as  a  starting  point  to  observe  any  correlations  to  CG 
lightning  occurrence.  Each  location's  index  data  was 
merged  effortlessly  with  the  CG  lightning  data  using  the 
statistical  software  package  S-Plus.  The  merge  by  variable 
(MONTH /DAY /YEAR /HOUR)  command  allowed  the  dates  and  times 
of  the  indices  to  sync  with  the  lightning  data.  An 
interesting  way  to  indicate  the  initial  relationships 
between  the  index  values  and  CG  lightning  occurrence  using 
SPSS  were  displayed  as  simple  line  plots  by  month  of  CG 
lightning  occurrence  versus  counts  of  the  number  of  times 
each  index  was  within  the  thresholds  established  in  AFWA 
TN-98/002 . 

For  instance,  KO  and  SSI  seem  to  have  an  inverse 
relation.  This  was  a  bit  confusing  at  first  but  realizing 
that  no  lower  or  upper  limits  were  constrained  on  these  two 
indices  suggests  that  limits  should  be  applied.  The  SWEAT 
index  matched  the  best  in  the  summer  months,  but 
significantly  over-counts  in  the  winter/spring  months. 
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Figure  11.  00Z  LBF  CG<50nm  occurrence  in  thick  line  vs. 

"weak"  thresholds  for  index  counts  by  month. 


This  suggests  different  thresholds  for  the  "cool"  months 
might  be  appropriate  by  an  adjustment  of  the  lower 
thresholds  toward  more  unstable  values.  TTI  and  KI  have  a 
reverse  relationship  in  that  they  seem  to  grasp  a 
correlation  in  the  "cool"  months  while  dramatically  under¬ 
counting  events  during  the  "warm"  active  season.  In  this 
case,  an  adjustment  should  be  made  to  include  more  unstable 
values.  Before  any  adjustments  were  made,  comparisons 
using  the  "strong"  threshold  values  from  AFWA  TN-98/002 
were  compared  in  Figure  10 . 

The  predictive  ability  of  the  indices  using  the 
"strong"  threshold  values  indicates  that  the  thresholds 
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established  for  TTI  matched  astonishingly  well  to  the 
occurrence  of  CG  lightning  events  (N>0  for  OBCOUNT) .  KO 
thresholds  significantly  over-counted  events  while  the 
remaining  indices  substantially  under - count ed . 


N>48  for  TTI 

N>35  for  Kl 

■ 

N>300  for  SWEAT 
N<-5  for  LI 
N<2  for  KO 
N<-3  for  SSI 

Nin(2500,5500)  CAPE 

■ 

N>0  for  CG  COUNT 


Figure  12.  00Z  LBF  CG  occurrence  as  thick  line  vs. 

"strong"  threshold  index  counts  by  month. 


With  these  results,  a  comparison  was  made  and  an 
attempt  to  determine  a  more  suitable  range  of  threshold 
values  was  made  while  keeping  in  mind  the  relationships  of 
the  indices  found  for  the  "weak"  and  "strong"  thresholds. 
Results  for  the  best -fit  thresholds  at  LBF  and  OUN  are 
displayed  in  Figures  11-14. 
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LBF  Best  Annual  Index  Thresholds. 
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Figure  14.  12Z  -  LBF  Best  Annual  Index  Thresholds. 
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Figure  16.  12Z  -  OUN  Best  Annual  Index  Thresholds 
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3.5  A  Better  Range  of  Index  values 


Categorical  box  and  whisker  plots  of  the  annual  range 
of  index  values  for  00Z  OUN  (Figures  15  and  16)  help 
ascertain  another  way  to  evaluate  the  annual  range  of 
values  each  index  can  take  on  for  days  with  and  without  CG 
lightning  (labeled  in  the  figures  as  none  and  t-storm) . 
Suggested  improvements  and  results  for  LBF  00Z  and  12Z  are 
displayed  in  Tables  4  and  5. 

A  box  and  whisker  diagram  illustrates  the  spread  of  a 
set  of  data  about  the  mean.  It  also  displays  the  upper 
quartile,  lower  quartile  and  interquartile  range  of  the 
data  with  50%  of  the  data  residing  inside  the  "box" .  A 
shorter  box  in  this  case  is  indicative  of  more  consistency 
as  a  categorical  predictor  with  a  narrower  range  of  values. 

The  annual  categorical  box  and  whisker  plots  once 
again  reveal  the  decreased  variability  of  most  indices  for 
the  t-storm  category. 
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SWEAT 


Figure  18 .  OOZ  -  OUN  categorical  box  plot  of  SWEAT  &  LI  - 
annual  summary. 
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Figure  19. 


none 


00Z  -  OUN  categorical 
annual  summary. 
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Figure  20.  00Z  -  OUN  categorical  box  plot  of  CAPE 

-  annual  summary. 


With  50%  of  the  range  of  index  values  determined 
inside  the  "box" ,  there  is  good  agreement  as  to  the 
hypothesis  that  the  suggested  range  of  values  determined  in 
Tables  4  and  5  are  superior  to  the  range  of  values 
determined  for  general  thunderstorms  in  AFWA  TN-98/002 
(Table  3) .  Indeed,  for  example,  the  t- storm  "box"  for  KO 
in  Figure  15  indicates  a  range  of  values  from  -18  to  0 
inside  the  "box" ,  which  related  well  to  the  suggested 
threshold  range  of  values  for  00Z  OUN  in  the  southern 
plains  in  Table  5. 
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Table  4.  Suggested  range  of  values  for  predictive  ability 


of  each  index  to  CG  lightning  occurrence  in  the 
Northern  Plains. 


Index 

REGION  applied 

%  time  in  category 

12Z 

ooz 

CAPE 

Northern 

Plains 

66.50% 

500  to  6000 

500  to  6000 

K- Index 

Northern 

Plains 

57.10% 

25  to  35 

25  to  35 

K0- Index 

Northern 

Plains 

56.30% 

(-) 11  to  0 

( - ) 18  to  -3 

Lifted 

Index 

Northern 

Plains 

70.80% 

<  0 

<  0 

Showalter 

Northern 

Plains 

71.70% 

<  0 

<  0 

Total 

Totals 

Northern 

Plains 

67.20% 

>  47 

>  47 

SWEAT 

Northern 

Plains 

73 . 90% 

>  200 

>  200 

Table  5.  Suggested  range  of  values  for  predictive  ability 


of  each  index  to  CG  lightning  occurrence  in  the 
Southern  Plains. 


Index 

REGION  applied 

%  time  in  Category 

12Z 

ooz 

CAPE 

Southern 

Plains 

56.40% 

1300  to  4500 

1400  to 

4000 

K- Index 

Southern 

Plains 

49.40% 

23  to  36 

22  to  36 

K0- Index 

Southern 

Plains 

48.50% 

( - ) 14  to  -2 

(-) 19  to  0 

Lifted 

Index 

Southern 

Plains 

57.30% 

<  -2 

<  -2 

Showalter 

Southern 

Plains 

57 . 60% 

<  -1 

<  -1 

Total 

Totals 

Southern 

Plains 

56.50% 

>  47 

>  47 

SWEAT 

Southern 

Plains 

51.20% 

>  210 

>  190 
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3.6  Summary  of  Data  Analysis  Methods 


A  starting  point  for  assessing  the  utility  of  each 
index  to  CG  lightning  within  50nm  was  made  by  initially 
employing  the  suggested  range  of  index  values  in  AFWA  TN- 
98/002  for  general  thunderstorms  (Table  3).  The  "weak" 
threshold  range  of  values  seemed  to  have  the  lowest 
relationship  and  significantly  over-counted  CG  lightning 
events  while  the  "strong"  threshold  ranges  significantly 
under- counted  them.  An  improved  range  of  values  was 
determined  analytically  and  suggested  in  Tables  4  and  5. 
Wider  ranges  of  values  for  CAPE  were  required  for  12 Z  OUN 
and  slightly  more  unstable  values  were  required  for  the  00Z 
sounding  thresholds  to  be  more  germane.  However,  no  effort 
was  made  to  imply  the  severity  of  each  storm  event . 

Instead,  these  suggested  ranges  are  applicable  to  any  CG 
lightning  event  (within  50nm)  relative  to  the  indices  used. 
It  is  assumed  in  this  case  that  the  indices  are 
representative  of  the  atmosphere  within  a  50nm  radius  to 
determine  CG  lightning  occurrence.  The  suggested  range  of 
values  found  are  in  agreement  with  the  range  in  values  of 
the  box  and  whisker  plots  shown  in  Tables  4  and  5  for  the 
sampling  locations  (LBF  and  OUN) . 
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V.  Regression  Analysis 


In  an  effort  to  improve  upon  the  suggested  annual 
range  of  values  of  indices  best  determining  CG  lightning 
occurrence  established  in  Chapter  3,  regression  analyses 
were  conducted  to  statistically  suggest  any  utility  in 
using  individual  indices  or  a  combination  thereof  as 
predictors  of  CG  lightning. 

It  is  important  to  note  that  for  regression  analysis, 
only  the  "active"  months  5-9  together  are  considered  for 
each  location  using  the  reasoning  established  in  Chapter  2 
on  homogeneous  datasets.  The  categorical  box  and  whisker 
plots  in  this  case  should  appear  less  decisive  because  the 
non-active  months  are  not  considered. 

First,  linear  regression  methods  were  computed  with  an 
explanation  of  the  results,  limitations,  and  the  obvious 
disparities  with  linear  regression  applications.  Next, 
logistical  regression  methods  were  calculated  on  the 
occurrence  or  non-occurrence  of  CG  lightning  activity. 

In  Chapter  6  an  in-depth  look  and  introduction  to  the 
possibilities  of  using  classification  and  regression  trees 
as  a  forecast  tool  for  CG  lightning  prediction  and 
intensity  is  explored  with  motivating  results. 
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5.1  Initial  Regression  Assessment 


Again,  categorical  box  and  whisker  plots,  calculated 
for  months  5-9,  were  used  to  contrive  the  distributions  of 
the  predictor  variables  (indices)  as  functions  of  CG 
lightning  occurrence/non-occurrence  (labeled  as 
T-storm/none,  respectively)  in  Figures  21,  21,  and  23. 

Interestingly,  comparing  the  categorical  box  and 
whisker  plots,  most  indices  appear  to  have  a  predictive 
capability  by  displaying  less  variability  for  CG  lightning 
(T-storm)  events  and  more  variability  for  no  (none)  events. 
When  predictive  ability  is  considered,  least  overlap 
between  the  none/T- storm  categories  are  desired.  No  single 
index  stands  out  significantly,  but  a  few  seem  less  capable 
or  different  from  the  rest .  KO  and  CAPE  display  the  most 
significant  category  overlap  for  both  locations,  indicating 
the  least  predictive  capability.  SWEAT  and  CAPE  seem 
different  in  that  they  both  appear  to  be  the  only  indices 
whose  variability  (length  of  box)  increases  noticeably  for 
the  T-storm  category,  while  the  others  are  less  variable  in 
the  T-storm  category. 
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none  T=storm 


none  T=storm 


none  T=storm 


none  T=stornr 


none  T=storm  none  T=storm  none  T=storm 

Figure  21.  Box  and  Whisker  plots  of  each  index 
category  for  LBF  12Z  (months  5-9) . 
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none  T-storm  none  T-storm  none  T-storm 

Figure  22  .  Box  and  Whisker  plots  of  each  index  -  by- 
category  for  LBF  00Z  (months  5-9) . 
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Figure  23  .  Box  and  Whisker  plots  of  each  index 
category  for  OUN  00Z  (months  5-9) . 


by 


Comparing  00Z  OUN  and  00Z  LBF  -  KO,  CAPE,  and  SWEAT 
appear  the  least  capable  predictors.  However,  at  LBF  they 
are  somewhat  more  capable  in  deciphering  between  the  two 
categories  than  at  OUN  (less  category  overlap) .  In  fact, 
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KO  doesn't  appear  to  be  able  to  distinguish  between  the  two 
categories  at  all  at  OUN  in  Figure  19.  A  predictive 
quality  of  the  indices  to  CG  lightning  activity  to 
regression  techniques  are  considered  next . 


5.2  Stepwise  Regression 

Stepwise  regression  is  a  popular  method  when  searching 
for  good  subset  models,  especially,  as  in  this  case,  when 
the  number  of  independent  models  to  compare  with  is  large. 
Significance  was  chosen  at  the  95%  confidence  level  before 
a  variable  was  considered  for  model  inclusion. 

Stepwise  regression  indicated  that  SWEAT  alone  had  the 
most  significant  relationship.  This  relationship  improved 
somewhat  with  the  inclusion  of  TTI .  Many  of  the  other 
variables  were  dropped  from  the  model  due  to  multiple 
correlations.  R-Squared  values  ranged  from  0.057  at  12Z 
for  OUN  to  0.164  at  LBF  with  significance  at  the  95% 
confidence  levels  for  the  model. 

A  more  detailed  linear  analysis  was  computed  for  00Z 
LBF  since  it  showed  the  highest  propensity  for  a  fitted 
linear  model.  An  improvement  of  R-Squared  values  was  made 
by  forcing  the  model  equation  through  the  origin  so  the 
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constant  was  removed.  R- Squared  values  for  fitting  a 
linear  regression  line  to  all  cases  was  0.24  (Figure  20) . 

Best  stepwise  linear  regression  model  for  CG  lightning 
cases  only  was : 

CG>0  =  TTI* (-6.27973)  +  SWEAT* (3 . 45951)  (7) 

The  model  response  plot  in  Figure  21  and  22  show  the 
problem  with  fitting  a  linear  or  even  a  quadratic 
regression  line  to  CG  lightning  activity.  A  high  density 
of  "none"  or  non-occurrence  cases  along  the  x-axis 
(equation)  are  observed  for  a  large  range  of  SWEAT  values. 
Figure  22  shows  a  slight  "clean-up"  of  this  density  by 
plotting  only  the  CG  lightning  occurrence  cases.  There  is 
still  an  obvious  concentration  of  scatters  at  very  low  CG 
lightning  counts.  Perhaps  this  is  evidence  that  a  lOOnm 
radius  might  be  more  adequate  since  more  lightning  counts 
would  result,  but  more  than  likely,  the  density  pattern 
would  remain.  Regardless,  days  without  CG  lightning  (none) 
have  now  been  eliminated  and  attention  is  now  focused  on 
linear  regression  methods  to  determine  CG  lightning  counts 
when  a  CG  lightning  event  is  expected.  There  is  less 
utility  under  this  rationale  and  the  best  possible 
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regression  fit  was  through  a  quadratic  expression  (see 
Figure  24 ) . 


The  quadratic  expression  is  shown  as  the  best  R- 
Squared  fit  to  the  regression  model  using  only  the  SWEAT 
index.  The  95%  confidence  intervals  are  superimposed  and 
the  R-Squared  value  is  increased  to  0.35  -  not  ideal  and 
only  a  slightly  better  fit  than  when  all  cases  are 
considered  (R-Squared=0 . 28 )  .  The  quadratic  expression 
appears  to  be  better  at  capturing  the  CG  lightning 
densities  at  the  lower  range  of  values  for  SWEAT,  hence  the 
higher  R- Square  value. 


500  1000 

Fitted  :  TTI  *  (-6.28)  +  SWEAT  *  (3.46) 


Figure  24.  Fitted  linear  regression  results 
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Figure  26. 


00Z  CG  only  -  LBF  best  regression  fit 
(QUADRATIC) . 


5.3  Logistical  Regression 


Logistical  regression  analysis  extends  the  techniques 
of  multiple  regression  analysis  to  research  situations  in 
which  the  outcome  variable  is  categorical.  Logistic 
regression  was  used  to  study  how  the  rate  of  CG  lightning 
occurrence  to  non- occurrence  depended  on  the  indices  as  the 
independent  variables.  No  considerations  to  CG  lightning 
counts  can  be  made.  The  interest  here  was  whether  CG 
lightning  occurred  at  all  during  the  valid  12 -hour  period. 

A  transformation  of  the  data  was  made  in  SPSS  for  each 
location  to  add  an  additional  column  label  "CG.LOG",  which 
stands  for  CG  logistic.  A  logistic  transformation  has  only 
two  possible  outcomes,  in  this  case  whether  CG  lightning 
did  occur  (T-STORM)  or  CG  lightning  did  not  occur  (NONE) . 
Unlike  the  linear  regression  model  fit,  logistic  regression 
is  based  on  probabilities  associated  with  the  values  of  the 
categorical  predictor  (NONE/ T-STORM) . 

The  SPSS  logistic  model  results  for  00Z  OUN  with  a 
brief  explanation  of  each  test  measure  are  listed  in  Tables 
6  and  7 . 

The  case-processing  summary  in  Table  6  indicates  a 
substantial  amount  of  missing  data  occurred  (30%) .  This 
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was  primarily  due  to  the  high  missing  data  rates  of  SWEAT, 
KO,  and  the  CAPE  indices,  which  the  logistic  regression 
model  did  not  accommodate.  Therefore,  the  model  eliminated 
all  cases  with  any  missing  values. 

The  classification  table  (Table  7)  summarizes  correct 
and  incorrect  estimates  of  "CG.LOG".  The  columns  are  the 
two  predicted  values  of  "CG.LOG"  (NONE  and  T - STORM ) ,  while 
the  rows  are  the  two  observed  (actual)  values  of  "CG.LOG". 
The  overall  percentages  for  both  classifications  were 
fairly  significant  at  75%. 


Table  6.  Case  processing  summary. 


Unweighted 

Cases 

N 

Percent 

Selected 

Cases 

Included  in 
Analysis 

745 

70.0 

Missing 

Cases 

319 

30.0 

Total 

1064 

100.0 

Unselected 

Cases 

0 

.0 

Total 

1064 

100.0 

Table  7.  Classification  Table. 


Observed 

Predicted 

CG.LOG 

Percentage 

Correct 

NONE 

T-STORM 

Step  1 

CG.LOG 

NONE 

276 

95 

74.4 

T-STORM 

91 

283 

75.7 

Overall 

Percentage 

75.0 

The  -2  Log  likelihood  in  Table  8  is  directly  related 
to  the  deviance  measure  used  in  decision  trees  in  the  next 
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chapter  and  is  discussed  in  greater  detail  there.  A  -2  Log 
likelihood  of  794  is  rather  large  and  is  an  indication  of 
the  variability  of  this  logistical  model  fit.  The  R-Square 
values  are  a  measure  of  the  strength  of  association  of  the 
indices  in  the  model  and  their  predictive  abilities.  The 
association  indicated  (0.274  and  0.365)  has  little 
significance . 


Table  8.  Model  Summary. 


Step 

-2  Log 
likelihood 

Cox  &  Snell 

R  Square 

Nagelkerke 

R  Square 

1 

794.458 

.274 

.365 

The  Hosmer  and  Lemeshow  goodness-of -f it  test  in 
Table  9  divides  the  predictors  (indices)  into  deciles  based 
on  predicted  probabilities,  and  then  computes  a  chi-square 
statistic  from  observed  and  expected  frequencies.  The  p- 
value  of  0.069  is  computed  from  the  chi-square  distribution 
(14.531)  with  8  degrees  of  freedom  and  indicates  that  the 
logistic  model  has  an  insignificant  fit  (Rice,  1994) . 

Table  9 .  Hosmer  and  Lemeshow  Test . 


Step 

Chi-square 

df 

Sig. 

1 

14.531 

8 

.069 
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VI.  Data  Mining  (DM)  and  Decision  Trees 


Traditionally  applied  statistical  methods  seem 
unfocused  as  a  predictive  tool  due  to  the  enormous 
variability  and  range  of  event  versus  non-event  of  the  index 
values.  More  revealing  ways  to  interrogate  the  data  were 
sought  to  possibly  improve  the  results  of  this  study. 
Originally,  it  was  thought  to  manually  use  SPSS  utilities  to 
partition  a  range  of  values  of  individual  indices  and  try 
and  find  the  best  probability  of  event  versus  non-event  of 
CG  lightning.  Additionally,  this  same  process  was  repeated 
for  the  CG  lightning  counts  as  the  response  variable  to  try 
to  establish  threshold  values  of  each  index,  if  possible, 
that  best  differentiate  between  active  (large  number  of  CG 
counts)  and  non-active  events.  Succeeding  at  these  methods 
would  prove  a  very  useful  forecast  tool  but  would  require 
extensive  manual  work  and  quickly  lead  to  research  for  an 
automated  process  already  developed  to  handle  such  a 
condition.  Literature  review  suggested  that  the  field  of 
Data  Mining  (DM)  might  best  serve  this  purpose. 


6.1  Data  Mining  -  A  Brief  History- 
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The  DM  field  was  initially  born  into  and  developed  by 
the  computer  realm  and  was  not  embraced  by  the  statistical 
community,  initially.  Even  today  there  are  skeptics,  but 
today  it  is  generally  accepted  as  a  useful  statistical  tool, 
especially  when  traditional  statistical  methods  fail. 

Data  mining  is  an  umbrella  term  that  was  initially 
applied  with  a  negative  undertone  by  the  statistical 
community  and  the  name  seemed  to  stick.  Other  names  applied 
to  DM  were  "fishing"  or  "data  dredging".  It  seemed  to 
statisticians  of  the  time  that  the  invalidation  of  their 
elegant  analytical  solutions  to  inferential  problems  by 
exploiting  data  through  "guesswork"  had  to  be  errant  (Selvin 
and  Stuart,  1966) . 

The  reason  decision  trees  can  handle  such  large 
databases  is  their  efficiency  in  computational  speed.  The 
concept  of  DM  has  largely  been  a  commercial  enterprise 
benefiting  computer  hardware  and  software  manufacturers  that 
emphasized  the  high  computational  abilities  associated  with 
DM  (Friedman,  1997) .  Although  significant  advances  in 
computational  speeds  over  the  years  have  been  made, 
computational  speed  remains  a  consideration  for  new 
approaches  to  robust  database  research.  Thus,  allowing 
studies  in  much  larger  scope  than  could  be  considered 
before . 
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There  has  been  an  immense  amount  of  research  on  the 


uses  and  applications  of  DM  tools  in  prediction  modeling, 
the  results  of  which  have  shown  that  they  can  and  do  surpass 
the  best  or  normally  used  models  currently  in  use  for  some 
applications.  Therefore,  DM  methods  should  be  taken 
seriously  as  a  statistical  prediction  tool. 

DM  is  used  to  discover  patterns  and  relationships  of 
large  observational  databases.  Statistical  software 
packages  such  as  S-Plus,  SPSS,  and  SAS  have  recently 
included  DM  packages  for  research  professionals  to  utilize. 
Some  DM  techniques  include:  Decision  tree  induction, 
clustering  methods,  neural  and  Bayesian  networks,  and 
genetic  algorithms,  to  name  a  few.  Decision  trees  fall 
under  the  realm  of  DM  and  are  therefore  introduced  as  a 
predictive  tool  for  this  study. 


6.2  Decision  Trees 

Decision  trees  are  based  on  a  hierarchal  "branched" 
structure  that  helps  find  and  plainly  display  key  facets  of 
very  large  databases.  They  are  important  in  many  DM  fields 
because  they  are  very  good  at  seeing  through  the  "noise"  of 
the  data  and  displaying  the  most  important  elements  of  the 
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results  in  a  straightforward  manner  (Friedman,  1997) .  They 
are  hierarchal  in  that  they  find  the  best  predictor 
variables,  recursively,  and  then  rank  and  display  them 
according  to  their  importance  in  ability  to  predict  the 
response  variable  -  exactly  what  was  desired  in  this  study. 
As  will  be  shown,  the  decision  tree  method  used  for  this 
study  finds  the  best  index  independently  (a  bivariate 
response)  and  is  simpler  in  approach  than  most  other 
methods.  Other  methods,  such  as  Oblique  Classifier  1  (0C1) , 
allow  for  a  multivariate  response  regression  tree  induction 
system.  In  other  words,  0C1  decision  trees  contain  linear 
combinations  of  one  or  more  predictors  at  each  tree  decision 
split  (Murthy  et  al . ,  1994).  The  result  is  an  oblique  split 
of  the  data.  Oblique  splits  are  said  to  be  more  powerful 
than  the  simpler  univariate  test,  but  also  more  "expensive" 
to  compute.  The  term  "expensive"  has  been  the  benchmark  of 
DM  tools  in  the  past  because  of  their  efficient  algorithms. 
Especially  in  its  infancy,  cost  as  a  measure  of 
computational  speed  was  a  much  higher  priority  and  selling 
point  for  DM. 

For  the  classification  and  regression  trees  applied  in 
this  study,  S-Plus  was  the  program  of  choice.  S-Plus  is  one 
of  the  mainstay  programs  that  incorporate  a  suite  of  data 
mining  commands  built  in  -  regression  trees,  K-means 
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Cluster,  and  Bayesian  Networks  to  name  a  few.  The 
difference  between  other  programs  with  decision  tree 
functions  built  in,  such  as  SPSS  and  SAS,  is  the  decision 
tree  algorithm  used  to  determine  the  best  node  splits. 
Improving  node  split  methods  for  decision  trees  and  other  DM 
tools  is  ongoing.  S-Plus  uses  reduction  in  deviance  as  a 
measure  to  find  the  best  discriminator  at  each  tree  node. 

Many  other  node  split  selection  techniques  exist  but 
past  results  have  shown  that  no  single  method  is  superior  to 
others  (Mingers,  1989) .  Therefore,  even  though  useful  and 
persistent  results  were  found  in  this  study,  it  is  suggested 
that  other  tree  methods  might  be  considered  for  comparison. 
S-Plus  tree  methods  were  adapted  for  this  study  because  they 
were  readily  available  and  assuring  results  were  found  when 
using  them.  It  is  proposed  that  comparisons  be  made  using 
0C1  and  C4 . 5  decision  tree  routines  that  are  readily 
available  for  research  purposes.  They  are  written  in  S 
language  for  operation  on  Unix  platforms  (Marmelstein, 

1999)  . 


6.3  Applications  of  Data  Mining 
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Decision  trees  are  one  of  the  main  data  analysis  tools 
used  in  DM  today  (Brodley,  C.  et  al . ,  1999  and  Murthy  et 
al . ,  1994).  Applications  of  DM  tools  are  being  introduced 
today  in  many  fields.  Applications  of  decision  trees  in  the 
past  are  very  significant.  Some  of  which  include: 


Astronomy : 

For  filtering  noise  from  Hubble  Space  Telescope 
(Salzberg  et .  al . ,  1995). 

Remote  Sensing: 

For  automatic  pattern  recognition  and  categorization  of 
earth  science  data  (Rymon,  R.  and  N.M.  Short,  Jr., 

1994)  . 

Hierarchical  decision  tree  classifiers  in  high¬ 
dimensional  and  large  class  data  (Byungyong,  K.  and  D. 
Landgrebe,  1991)  . 

Weather  Prediction: 

Experts  in  the  field  of  DM  are  continuously  searching 
for  data  to  exploit.  It  appears  that  weather 
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prediction  is  a  relatively  new  venue  to  DM  and  a  great 
potential  exists,  especially  for  military  weather 
operations . 

The  following  were  just  a  few  examples.  Many  other 
real-world  applications  exist,  especially  in  the 
bioengineering  and  medical  professions.  An  interesting 
example  worth  mentioning  is  the  application  of  decision 
trees  for  DNA  identification  by  S.  Salzberg  (1995) .  In  his 
dissertation,  Salzberg  applied  classification  trees  to  DNA 
sequences .  These  sequences  involved  thousands  of  base 
pairs,  of  which  the  sequence  of  interest  was  the  part  of  the 
DNA  code  for  proteins  that  occupied  only  a  small  percentage 
of  the  sequence.  He  found  that  decision  trees  for  this 
method  outperformed  any  other  technique  used  at  the  time. 

His  conclusion  was  that  decision  trees  are  "a  highly 
effective  tool  for  identifying  protein-coding  regions." 
Regression  trees  in  the  past,  per  se,  can  find  the  needle  in 
a  haystack,  and  are  highly  effective  and  efficient  at  it. 


6.4  Data  Mining  in  Weather  Prediction 
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DM  techniques  may  be  applied  in  order  to  generate  a 
more  reliable  set  of  decision  rules  for  weather  prediction, 
saving  resources  and  potentially  lives  (Marmelstein,  1999) . 

A  great  logical  situation  exists  here.  During  a  preliminary 
computer  study  of  328  tornado  cases,  the  Air  Weather  Service 
and  the  National  Severe  Storms  Forecast  Center  concluded 
that  14  weather  parameters  played  an  important  role  in  the 
production  of  severe  thunderstorms  and  tornadoes.  This 
study  was  conducted  prior  to  1972  and  the  parameters  chosen 
were  given  in  order  of  importance  based  on  computer  analysis 
and  forecast  experience  (Koceilski) .  The  conclusion  was 
that  the  stability  of  the  atmosphere  (easily  determined  by 
the  indices)  was  the  second  most  important  parameter 
involved.  Some  logical  questions  to  ask  today  would  be  - 
"Would  this  still  be  true  today?"  or  "Were  the  datasets  used 
back  then  comprehensive  enough?"  There  were  extreme 
limitations  in  data  analysis  recourses  and  in  the  manual 
techniques  applied  back  then.  It  would  be  relatively  easy 
to  tap  into  a  much  more  comprehensive  search  for  important 
weather  parameters  through  the  use  of  DM  tools. 

With  large  databases  of  weather  measurements  built  up 
over  the  years,  many  useful  applications  in  meteorology  may 
be  found  by  using  DM  techniques.  Decision  trees  were 
designed  to  handle  copious  amounts  of  data  for  quick  and 
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efficient  calculation  and  display.  More  generally,  decision 
trees  are  basically  a  series  of  tests  organized  in  a  tree¬ 
like  structure,  where  each  test  on  a  node  split  is 
equivalent  to  a  linear  discriminate  as  in  normal  regression. 
In  other  words,  the  number  of  iterations  for  normal 
regression  is  equivalent  to  the  number  of  nodes  in  a 
decision  tree.  But,  unlike  normal  regression,  combinations 
of  nominal /ordinal  data  may  be  used  as  predictors.  This  is 
one  of  the  dominant  traits  of  decision  trees  versus  normal 
statistical  regression  models.  Decision  trees  have  the 
inherent  ability  to  choose  the  best  predictors  among  a 
multivariate  set  for  the  given  task.  As  will  be  seen,  the 
trees  grown  for  this  study  were  quite  small  because  the  most 
significant  features  were  of  primary  concern.  Having  a 
relatively  small  tree  as  a  forecast  tool  also  makes  them 
both  easy  to  use  and  to  understand.  Decision  tree  experts 
say  one  should  prefer  the  simplest  model  that  fits  the  data 
(Bishop,  1995)  . 


6.5  S-Plus®  Model  Used  in  this  Study 

The  recursive-partitioning  algorithm  underlying  the 
decision  tree  function  in  S-Plus  tries  to  choose  the  most 
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significant  50/50  split  that  partition  each  predictor 
variable  (indices)  into  increasingly  homogeneous  regions  by 
a  method  of  reduction  in  deviance.  The  result  is  not  only 
determining  the  most  important  index  among  the  others  as  a 
predictor  for  each  location,  but  also  the  most  precise 
threshold  value.  To  visualize  this,  imagine  a  scatter  plot 
of  each  index  divided  so  that  at  any  node,  the  split  that 
maximally  distinguishes  or  categorizes  the  response  variable 
in  the  left  and  the  right  branches  is  selected.  This 
process  is  done  recursively  on  each  separate  predictor 
variable  and  determines  which  index  is  the  single  best 
predictor  (using  the  reduction  in  deviance  goodness  of  fit 
measure)  for  the  assigned  tree  node  split. 

By  applying  the  reduction  in  deviance  measure,  the 
amount  of  overlap  between  the  categories  (misclassif ication 
error  rate)  is  minimized.  The  average  misclassif ication 
error  rate  for  most  locations  ranged  between  25-30%,  which 
is  similar  to  the  logistic  regression  classification 
results.  Misclassif ication  error  rates  suggest  that  the 
best  results  identified  are  correct  25-30%  of  the  time. 

This  is  the  best  decision  tree  model  fit  that  can  be 
expected  for  such  large  variances  seen  in  the  indices. 

The  tree  model  used  in  S-Plus  for  a  classification  tree 
assumes  that  the  response  variable  follows  a  multinomial 
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distribution.  The  multinomial  distribution  is  just  a 
natural  extension  of  the  binomial  distribution  to  allow  any 
finite  number  of  categories  instead  of  just  two  for  the 
binomial . 

The  two  types  of  decision  trees  used  in  this  study  were 
classification  and  regression.  Both  classification  and 
regression  trees  were  useful  tools  for  predicting  CG 
lightning  activity.  If  the  response  variable  was  a  factor 
(categorical) ,  such  as  t-storm/no  t-storm  (none) ,  the  tree 
is  called  a  classification  tree.  If  the  response  variable 
is  numeric,  such  as  CG  lightning  counts,  then  a  regression 
tree  is  calculated. 

A  summary  of  the  decision  tree  algorithm  process 
follows  these  3  simple  steps  to  determine  or  "fit"  the  best 
results : 

1.  Split  the  set  of  predictors  (indices)  using  the  goodness 
of  fit  measure  (S-Plus  uses  reduction  in  deviance) .  Using 
the  reduction  in  deviance  measure  for  each  potential  split 
in  a  classification  tree  is  similar  to  the  log-likelihood 
used  in  logistic  regression  for  classification  trees  and 
poissan/logit  regression  for  regression  trees.  These  tests 
are  done  recursively  on  each  predictor-TTI ,  KI ,  SWEAT,  etc. 
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2.  Check  the  results  of  each  split  comparison.  Find  the 
best  splits  for  each  index  and  if  every  partition  is  pure, 
meaning  all  indices  in  the  partition  belong  to  the  same 
class  (none  or  t-storm) ,  then  stop.  Label  each  leaf  node 
with  the  name  of  the  best  class  and  threshold  value. 

3.  Continue  to  recursively  split  any  partitions  that  are  not 
pure . 


Figure  27  is  an  example  graphic  display  of  the 
straightforward  manner  of  S-Plus  classification  tree  output 
at  00Z  OUN.  The  lengths  of  the  tree  branches  are 
proportional  to  the  significance  of  each  classified  split, 
which  equates  to  the  quantity  of  reduction  in  deviance. 

Also  displayed  is  a  burl  plot  of  each  index  at  the  first 
split,  which  indicates  the  goodness  of  fit  summary  for  each 
predictor  at  the  model's  parent  node  (KI<27.75) .  The 
goodness  of  fit  for  each  predictor  in  the  model  is  the 
difference  in  deviance  between  the  current  node  and  the 
successive  offspring  nodes.  The  burl  plot  is  a  single 
vertical  line  for  each  potential  split  which  is  used  to 
determine  the  best  threshold  value.  Reduction  in  deviance 
is  plotted  against  each  possible  potential  split;  with  the 
most  significant  split  in  this  case  when  the  K-Index  is 
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27 .75 .  The  worst  predictor  at  this  level  is  by  far  the  K0- 
Index,  noted  by  the  diminutive  vertical  extent  of  the  burl 
plot.  The  K0- Index  at  the  parent  node  appears  to  have 
nearly  no  prediction  capability  (reduction  in  deviance)  at 
any  of  the  potential  splits.  But  the  KO-Index  is  not 
excluded  in  any  potential  future  splits.  Indeed  if  we 
examine  the  burl  plot  at  the  KO<-16.0729  node,  in  Figure  28, 
we  can  see  that,  although  not  as  decisive  as  KI  was  at  the 
parent  node,  it  has  the  most  predictive  potential  at  that 
level  in  the  tree,  compared  to  the  other  indices.  Further 
analysis  of  the  data  is  needed  to  determine  if  this  split 
poses  any  prediction  potential. 

Figure  29  displays  the  burl  plot  at  the  SWEAT<230.5 
node  (right  side  branches  of  the  classification  tree) .  KI 
and  LI  also  appear  to  have  predictive  abilities  but  the  most 
significant  reduction  in  deviance  (highest  and  steepest 
peak)  is  at  the  SWEAT<230.5  split. 
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Figure  27. 


Example  classification  tree  output  for  00Z  OUN 
with  burl  plots  of  the  first  tree  node  split. 


Figure  28.  Burl  function  for  00Z  OUN  at  the  K0<-16  tree 
node  split. 


Figure  29. 


Burl  function  for  00Z  OUN  at  the  SWEAT<230.5 
tree  node  split. 


6.10  Classification  Tree  Summary  Output 


Actual  S-Plus  summary  tree  output  in  Table  10  is  the 
non-graphic  display  of  the  same  tree  in  Figure  27.  The 
significant  features  are  highlighted  for  simplicity.  The 
classification  tree  results  are  not  as  easily  discernable 
than  the  graphic,  but  more  detail  about  what  is  going  on  at 
each  non-terminal  node  branch  is  possible.  An  explanation 
of  the  summary  tree  output  for  00Z  OUN  in  Table  10  follows: 
The  Parent  NODE  is  split  into  the  first  two  branches  (none 
and  t-storm  NODE)  which  are  labeled  as  nodes  2)  and  3) 
respectively.  This  first  split  is  the  most  significant  with 
the  significance  of  all  the  remaining  splits  depending  on 
the  homogeneity  of  subsequent  index  threshold  splits,  if 
any.  In  this  example,  there  are  4  subsequent  splits  after 
the  first  split  (5  terminal  nodes) ,  their  significance 
depending  upon  further  analysis. 

KI<27 . 75  is  the  most  significant  threshold  for 
predicting  no  CG  lightning  occurrence  (none) .  There  were 
335  cases  in  the  none  category  split  with  accuracy  near  72%. 
Next,  combine  this  with  LI >3.0  in  the  next  branch  (node  5) 
and  the  probability  increases  to  an  82%  occurrence. 

KI>27.75  is  the  most  significant  threshold  for 
predicting  CG  lightning  occurrence  (t-storm) .  There  were 
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370  cases  in  the  t-storm  category  split  with  60%  initial 


accuracy.  When  this  is  combined  with  SWEAT>230.5  in  the 
next  branch  (node  7)  the  probability  of  CG  lightning 
increases  to  72%  accuracy. 


Table  10.  Actual  S-Plus  summary  tree  output  for  00Z  OUN. 

***  Tree  Model  *** 

Classification  tree: 

Number  of  terminal  nodes:  5 

Residual  mean  deviance:  1.234  =  863.8  /  700 
Misclassif ication  error  rate:  0.33  =  238  /  705 
node),  split,  n,  deviance,  yval ,  (yprob) 

*  denotes  terminal  node 

Parent  NODE : 

1)  root  705  971.30  Parent  NODE  (  0.5461  /  0.4539  ) 

none  NODE : 

2)  KI<27 . 7 5  331  397.70  none  (  0.7194  0.2806  ) 

4)  LI<3. 00625  229  289.60  none  (  0.6725  0.3275  ) 

8)  KCx-16 .0729  111  129.50  none  (  0.7297  0.2703  )  * 

9)  KO>-16 . 0729  118  156.90  none  (  0.6186  0.3814  )  * 

5)  LI>3. 00625  106  99.69  none  (  0.8208  0.1792  )  * 

t-storm  NODE: 

3)  KI>27 . 75  370  494.60  t-storm  (  0.3892  0.6108  ) 

6)  SWEAT<230 . 5  195  270.20  t-storm  (  0.4872  0.5128  )  * 

7)  SWEAT>230 . 5  175  207.50  t-storm  (  0.2800  0.7200  )  * 


As  discussed  previously,  misclassif ication  error  rates 
(or  costs)  are  important  for  analyzing  the  significance  of 
classification  trees.  Misclassif ication  costs  become  more 
significant  when  the  categorical  counts  are  low.  For  this 
example,  these  counts  can  be  considered  adequate  for 
KI<27 . 75  (N=335)  at  the  first  tree  node  but  at  the  LI>3.0 
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threshold  in  the  next  tree  node,  counts  are  debatable 
(N=106,  which  is  near  the  minimum  deemed  necessary  for 
significance) .  Note  that  less  than  a  15%  gain  in  category 
prediction  between  LI<3.0  (node  4)  and  LI>3.0  (node  5)  are 
revealed  (0.67  versus  0.82  respectively),  which  is  not 
highly  significant. 

In  summary,  slight  increases  in  the  significance  of  the 
results  were  found  in  the  indices  ability  to  predict  no  CG 
lightning  events  (within  a  50nm  radius) . 


6 . 7  Determining  a  Significant  Decision  Tree 

At  this  point  it  is  important  to  mention  that  sometimes 
the  significant  split  determined  by  the  tree  may  favor  one 
classification  over  the  other.  An  inherent  potential 
imperfection  of  tree  algorithms  results  when  there  are  a 
disproportionate  number  of  classifiers  (none/t-storm)  or  the 
total  number  of  cases  is  too  small  (Fickett,  J.  and  C.S. 
Tung,  1992) .  Maximizing  the  dataset  for  each  location 
should  alleviate  this  imperfection. 

Fickett,  J.  and  C.S.  Tung  (1992),  found  that  decision 
trees  tend  to  optimize  accuracy  on  the  larger  class  of  data. 
This  appears  to  be  the  case  for  this  study  as  well, 
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especially  for  some  of  the  12Z  dataset  results  at  some 
locations  due  to  fewer  t-storm  category  occurrences  for  the 
12 -hour  period.  If  annual  data  was  considered,  an  even  more 
disproportionately  higher  number  of  none  classifications 
would  result  since  obviously  there  are  fewer  to  no 
classifications  of  CG  strikes  in  the  "cooler"  months. 

For  example,  the  maximized  tree  models  at  12Z  for  most 
locations,  typically  showed  the  initial  (parent  node)  split 
of  60/40  (none/t-storm)  (see  results  displayed  in  Appendix  A 
and  B) .  This  may  have  an  influence  on  the  12Z  tree  results 
since  optimum  initial  category  split  would  ideally  be  50/50. 
Consequently,  the  most  significant  split,  for  example,  at 
12Z  LBF  was  discerned  at  a  questionable  threshold,  with  LI 
ascertained  as  the  most  important  index  with  a  threshold 
value  of  4.0.  Experience  tells  us,  and  comparisons  made  to 
nearby  locations,  suggest  that  this  threshold  value  is 
questionable.  Also,  the  normal  tendency  to  split  the  first 
node  into  a  decisive  category  was  anomalous  in  that  it  chose 
a  higher  than  normal  probability  for  none  but  a 
disproportionately  lower  probability  than  normal  for  t- 
storm.  The  first  node  prediction  at  12Z  LBF  for  t-storm  was 
lower  than  that  for  none  resulting  in  both  first  node 
classifications  as  none ,  which  is  not  ideal  for  the  first 
category  split.  These  facts  again  support  the  justification 
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for  combining  months  5-9,  to  "equalize"  the  categories  and 
"clean  up"  the  data. 

In  Table  11  the  summary  statistics  for  each  predictor 
indicates  that  the  best  split/threshold  values  are  not  just 
the  mean  or  median.  In  fact,  for  00Z  OUN  it  is  actually 
establishing  the  split  based  on  the  tree  model's  goodness  of 
fit,  which  is  the  best  reduction  in  deviance  for  S-Plus. 

The  most  significant  index  as  a  predictor  was  KI  with  a 
threshold  value  of  27.75  which  is  somewhere  between  the 
median  and  mean.  Also  note  the  significantly  higher  rates 
of  missing  values  for  SWEAT,  KO,  and  CAPE,  which  are 
excluded  in  the  maximized  tree  models. 


Table  11.  Summary  statistics  for  00Z  OUN. 


Summary  (OUN  -  OOz) 

TTI 

KI 

SWEAT 

Median:  46.40 

Median :  28.40 

Median: 203.5 

Mean:  45.77 

Mean:  25.55 

Mean: 218.2 

Missing:  37.00 

Missing : 38 . 00 

Missing : 207 . 0 

KO 

SSI 

CAPE 

LI 

Median: -13.170 

Median: -0.249 

Median : 1333 . 0 

Median: -1 . 3590 

Mean:  -12.220 

Mean:  0.319 

Mean : 1548 . 0 

Mean: -0.6466 

Missing : 223 . 00 

Missing : 37 . 00 

Missing : 236 . 0 

Missing : 33 . 00 

6.10  The  Significance  of  Missing  Data 


Missing  values  can  occur  either  in  data  used  to  build 
trees,  or  in  a  set  of  predictors  for  which  the  value  of  the 


86 


response  variable  is  to  be  predicted.  There  were  no  missing 
days  for  the  CG  lightning  summary  data  so  after  merging  the 
two  data  sets  (indices  and  CG  lightning) ;  missing  data  were 
found  for  the  indices  only.  Similar  to  logistic  regression, 
tree  regression  permits  missing  data  only  in  predictor 
variables.  Missing  data  can  be  a  problem  if  there  are 
consistent  underlying  relationships  in  the  reason  it  was  not 
calculated  in  the  dataset,  causing  distorted  results.  For 
the  purpose  of  generating  a  decision  tree  with  the  highest 
unambiguous  set  of  classifications,  Marmelstein  (1999) , 
suggests  "filling  in"  each  missing  case  of  the  dataset  with 
a  fixed  value  using  an  imputation  method  to  "repair"  them. 
S-Plus  utilizes  a  built-in  feature  that  enables  the  user  to 
automatically  eliminate  the  missing  attributes  or  replaces 
them  with  a  new  factor  variable;  with  an  added  level  named 
"NA"  for  the  missing  variables.  It  then  leaves  numeric 
predictors  alone  even  if  they  contain  missing  values.  Since 
this  study  is  based  on  real-world  results  in  a 
climatological  framework,  any  manipulation  or  "imputation" 
method  would  not  be  desirable.  Instead  methods  were 
researched  to  maximize  the  database  for  each  location.  As  a 
result,  using  the  "full -model"  approach  might  be  wavered  for 
an  approach  that  would  maximize  the  usable  data  for  the 
model  as  well  as  be  consistent  with  results. 
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6.9  Determining  Significant  Results 


Classification  (decision)  trees  have  a  tendency  to 
purposely  over-fit  the  data.  Brieman  et  al .  (1984),  whose 

works  on  CART  are  referenced  often  in  data  mining  literature 
and  is  the  basis  for  the  development  the  tree  technique  used 
in  this  study  (S-Plus) ,  determined  methods  for  best  tree 
development.  For  best  results,  the  tree  should  be  over¬ 
fitted,  in  other  words,  grown  too  large  in  order  to  not  miss 
any  key  splits  that  may  be  hidden.  This  yields  very  low  to 
near  perfect  misclassif ication  error  rates  but  reciprocate 
reliability.  The  reasoning  behind  this  is  that  the  data  may 
show  a  downward  trend  or  insignificant  reduction  in 
deviance,  and  then  show  a  significant  trend  "hidden"  further 
down  the  tree.  After  the  tree  is  grown  it  should  then  be 
"pruned"  back  for  the  best  fit.  Finding  the  best  fit  is 
rather  subjective  because  it  depends  on  the  nature  of  data 
being  used.  In  this  case  we  are  looking  at  the  data  sample 
in  more  of  a  real-world  climatological  sense  so  interests 
are  in  the  key  splits  -  the  most  important  ones.  We  are 
interested  in  high  probabilities  with  high  occurrences  for 
best  model  accuracy.  Knowing  this  brings  forth  the 
realization  that  if  a  node  is  split  significantly  above  a 
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minimum  requirement  (say  100  cases)  then  that  node  must  be 
highly  significant.  As  a  result,  most  decision  tree 
software  have  built-in  pruning  functions  to  "prune  back"  the 
insignificant  nodes.  One  solution  to  the  problem  of  over¬ 
fitting  is  to  reduce  or  limit  the  size  of  the  tree  in  some 
manner.  Over- training  can  be  alleviated  in  S-Plus  by 
requiring  a  minimal  number  of  cases  before  a  split  is 
considered.  It  was  found  that  a  minimum  of  100  cases  worked 
best  to  maximize  results  and  minimize  over-fitting.  Results 
were  maximized  in  that  none  of  the  key  variables  or  node 
splits  were  missed  or  left  out  because,  as  will  be  shown  in 
the  output  example,  more  insignificant  or  less  accurate 
splits  are  found  when  sample  size  splits  reach  100  cases. 
This  is  supported  by  the  fact  that  a  noticeable  number  of 
violations  of  index  stability  trends  were  present  for  nodes 
with  observations  near  or  below  100.  For  example,  a  node 
further  down  the  12Z  tree  modeled  for  FWD  (Fort  Worth,  TX) 
with  fewer  than  100  observations  indicated  a  split  that 
indicated  a  higher  probability  for  classifying  t- storm  when 
K0>  -11.9  (see  Appendix  A) .  This  is  normally  considered  a 
more  stable  trend  for  the  KO  Index.  This  is  most  likely  an 
inconsistent  spurious  trend  because  it  is  likely  there  were 
not  enough  cases  involved  to  show  consistent  results.  Other 
reasons  could  be  an  indication  that  lower/upper  bounds  may 
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exist,  but  in  any  case,  these  reasons  would  not  improve  the 
results  significantly. 

Another  way  to  supplement  the  proper  size  of  a  tree  is 
displayed  in  Figure  26,  which  is  a  reduction  in  deviance 
versus  number  of  tree  nodes  plot.  Again,  the  most 
significant  reduction  in  deviance  is  obtained  in  the  first 
split.  Subsequent  splits  at  node  5  indicate  the  most 
significant  reduction  in  deviance  and  results  appear  minimal 
thereafter.  It  was  determined  that  the  most  significant 
information  obtained  from  the  classification  tree  results 
were  within  the  first  5  nodes,  with  preferred  significance 
given  by  limiting  case  counts  to  100,  which  typically 
resulted  within  the  first  3  nodes. 

Marlelstein  (1999),  Brieman  et  al .  (1994),  Mingers 

(1989) ,  and  others,  suggest  that  due  to  the  various  methods 
used  in  node  split  selection,  for  best  results  it  is  best  to 
cross-validate  the  results  for  consistency.  Techniques  used 
to  test  the  "validity"  of  tree  methods  can  be  made  by 
removing  a  portion  of  the  data  before  the  tree  is  grown  (or 
trained) ,  then  grow  the  tree  on  both  data  sets  and  compare 
the  results. 
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Figure  30.  Reduction  in  deviance  versus  node  size  plot  for 
00Z  OUN. 


The  amount  of  data  to  set  aside  for  comparison  is 
highly  subjective,  but  the  smaller  the  test  sample,  the  less 
likely  consistent  results  will  be  obtained.  Some  suggest  10 
to  20  percent  as  a  test  sample  for  large  databases,  while 
others  suggest  40  to  50  percent,  if  possible.  For  this 
study,  comparisons  of  the  full  model  results  (SWEAT,  KO,  and 
CAPE  included)  lowered  the  total  case  counts  for  each 
location  by  at  least  30%  due  to  the  missing  data  of  these 
indices.  Consistencies  between  the  split  selections  of  both 
models  for  each  location  indicated  very  stable  results  and  a 
high  degree  of  confidence  in  the  outcome.  These 
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consistencies  are  easily  determined  by  comparing  the 
graphical  results  of  both  models  in  Appendix  C  and  D. 

For  the  decision  tree  used  in  this  study,  high 
probabilities  and  high  occurrences  signify  very  significant 
results  as  a  forecast  tool .  So  a  higher  credence  should  be 
to  the  first  split  and  then  only  the  subsequent  splits 
that  indicate  a  continued  large  occurrence  (N)  count  along 
with  consistency  to  surrounding  sites.  It  appears  that 
results  with  counts  near  250-300  cases  should  be  considered 
highly  effective.  More  confidence  can  be  applied  to  lower 
counts  nearing  100  cases  if  the  threshold  values  of  the 
index  remain  consistent  to  surrounding  locations  possessing 
higher  counts.  For  example,  at  00Z  OUN  (see  Appendix  B  — 

00Z  Full  Model  Results  -  NO  T-Storm) ,  KI<10.8  remains  a 
consistent  predictor  threshold  value  for  NO  T-Storm  at  many 
surrounding  locations:  OAX:  KIcll.l,  TOP:  KI<10.6,  RAP: 
KI<10.6,  and  LZK:  KI<11.9,  even  though  the  case  counts 
dwindle  to  near  100  cases,  these  consistencies  have  far 
reaching  implications  to  the  significance  of  the  index  and 
associated  threshold  values  determined  by  the  tree  model . 

So  KI  values  near  11.0  indicate  a  compelling  threshold  for 
predicting  a  non-event  nearing  90%  accuracy  at  those 
locations.  In  the  next  section  the  most  significant  tree 
results  are  compared. 


92 


6.10  Decision  Tree  Results 


Appendix  A  consists  of  a  transformation  of  the  less 
user-friendly  textual  S-Plus  tree  output,  as  seen  in 
Appendix  C,  to  an  easily  ascertained  forecast  tool. 
References  are  made  to  a  "maximized"  tree  model  and  a  "full" 
model.  The  full  model  includes  all  of  the  indices  in  the 
tree  calculations,  resulting  in  data  loss  caused  by  missing 
data  in  some  of  the  indices  (see  section  on  missing  data) . 
SWEAT,  KO,  and  CAPE  contain  the  highest  missing  data  rates 
(over  40%  at  some  locations)  so  a  maximized  model  was 
developed  that  included  only  KI ,  SSI,  LI,  and  TTI .  The  full 
model  is  included  for  cross-validation  purposes  in  Appendix 
B.  The  classification  tree  results  may  be  utilized  as  a 
more  significant  model  at  some  locations  for  regression  tree 
results  due  to  the  importance  of  SWEAT  and  CAPE  as  CG 
lightning  count  predictors  at  those  locations. 

The  tree  model  results  were  analyzed  for  each  location, 
each  time  period  (00Z/12Z) ,  and  for  both  full  and  maximized 
models.  The  most  significant  results  were  then  quantified 
in  Appendix  A.  These  official  summaries  are  displayed  in 
tabular  and  graphical  form  in  Appendix  A,  and  present 
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forecasters  a  user-friendly  interface  to  interpret  the  index 
threshold  results  by  geographical  region. 

Table  12  is  a  summary  of  the  classification  tree 
results  for  the  maximized  model  at  00Z  OUN  and  LBF  which  are 
tabulated  in  Appendix  A. 


Table  12 .  Sample  classification  tree  results  at  00Z  from 
Appendix  A  for  maximized  dataset. _ 


LBF  (1037) 

T-Storm 

N 

P 

if 

SSK1.1 

585 

0.69 

& 

Kl>30.6 

305 

0.78 

& 

TTI>52 

147 

0.86 

No  T-Storm 

N 

P 

if 

SSM.1 

452 

0.76 

& 

SSI>5.6 

152 

0.84 

& 

TTK42.9 

122 

0.76 

OUN  (1009) 

T-Storm 

N 

P 

if 

Kl>25.2 

605 

0.56 

& 

LK-1.1 

402 

0.63 

& 

Kl>35 

157 

0.74 

No  T-Storm 

N 

P 

if 

KK25.2 

404 

0.75 

& 

TTK46.7 

293 

0.8 

& 

KK10.8 

107 

0.87 

Next  to  the  site  identifiers  (OUN  or  LBF)  in  Table  12 
are  the  total  number  of  cases  available  for  calculation  by 
the  tree  model  (in  parenthesis) .  Noticeable  decreases  in 
the  total  number  of  available  cases  are  seen  in  the  full 
model  results  due  to  missing  cases  mentioned  earlier.  Below 
the  location  identifiers  are  the  two  tree  classes  determined 
by  the  initial  split:  T-Storm  and  No  T-Storm.  These 
represent  the  occurrence  and  non-occurrence  of  a  CG 
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lightning  event  within  50nm  for  the  valid  time  period.  The 
first  index  listed  below  the  T- Storm  category  gives  the  most 
significant  index  and  the  threshold  value  (KI>25.2)  for  OUN 
and  (SSIcl.l)  for  LBF .  This  is  the  first  (parent)  split  in 
the  tree;  therefore  the  same  index  will  be  listed  under  the 
No  T- Storm  category  as  well .  The  N  and  P  columns  are  the 
number  of  cases  at  that  tree  branch  (node)  and  the 
probability  of  the  occurrence  for  that  category 
respectively.  Notice  that  the  N  cases  from  the  parent  split 
add  up  to  the  total  number  of  cases  for  that  location.  Each 
index  is  listed  by  importance  and  is  inclusive;  such  is  the 
hierarchal  nature  of  the  classification  tree.  Inclusive  is 
the  reason  for  the  "if",  and  "or  if"  statements  labeled 

next  to  each  threshold  index.  Each  combination  listed  leads 
to  an  increased  probability  of  categorical  occurrence,  but 
are  valid  only  if  each  occur  inclusive  with  the  other  when 
preceded  by  an  symbol. 

.The  results  for  OUN  in  Table  12  should  read  as  follows: 

There  were  a  total  number  of  1009  cases  classified. 

The  most  significant  index  and  threshold  value  for 
classifying  T-Storm  is  when  KI>25.2,  in  which  there  were  605 
cases.  At  this  threshold,  56%  of  the  time  CG  strikes 
occurred  within  50nm.  On  the  other  hand,  if  KI<25.2  (NO  T- 
Storm  category) ,  then,  of  the  404  cases,  no  CG  lightning 
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strikes  occurred  75%  of  the  time.  Notice  the  first  split  at 


this  location  was  somewhat  offset,  favoring  the  NO  T- Storm 
category.  To  improve  the  odds  we  climb  to  the  next 
"inclusive"  branch  in  the  T- Storm  category  side  of  the  tree 
output  which  suggests  if  KI>25.2  &  LI<  -1.1  then  63%  of  the 
402  cases  included  the  occurrence  of  a  CG  strike  within 
50nm.  By  combining  LI<  -1.1  with  KI>35  (the  next  tree 
node) ,  the  total  number  of  cases  dwindles  to  157,  but  of 
these  157  cases,  74%  of  the  time  there  was  a  CG  lightning 
strike  within  50nm.  The  probability  for  the  No  T- Storm 
category  increases  to  87%  when  TTI<46.7  and  KI<10.8  occur. 
This  tree  split  combines  a  rather  stable  KI  value  (<10.8) 
with  TTI<4 6 . 7  which  only  occurred  107  times,  but  with  a 
significant  probability  (87%) . 

6.11  Regional  Summary  Results 

For  the  maximized  tree  model  at  00Z,  KI  was  the  best 
predictor  to  CG  strike  occurrence  within  50nm  (T-Storm) .  KI 
was  typically  either  the  most  significant  or  the  second  most 
significant  by  location  at  00Z,  with  threshold  values 
ranging  from  25-30  for  all  locations  with  probabilities  near 
70%  and  high  case  counts  (N) .  Best  results  were  obtained  at 
LBF ,  OAX,  and  DDC,  where  thresholds  of  KI>30.5  gave 
probabilities  near  80%  with  case  counts  exceeding  300,  which 
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were  deemed  very  significant.  At  OAX,  case  counts  were 
N=209 ,  but  OAX  has  a  lower  proportion  of  total  cases  (N=800) 
compared  to  LBF  and  DDC  (N=1037  and  N=1012  respectively) , 
due  to  sounding  data  availability  problems  prior  to  1995  at 
OAX. 

Highest  probabilities,  roughly  75-85%,  for  CG  strike 
occurrences  were  obtained  when  a  TTI  near  50.0  was  combined 
with  other  indices  at  many  locations. 

It  is  interesting  to  note  that  AMA  and  RAP  were  the 
only  locations  with  LI  as  the  lone  significant  predictor  at 
00Z,  with  AMA  requiring  a  slightly  more  unstable  value  (LI< 
-0.4) .  The  results  were  fairly  significant  at  these 
locations  with  initial  probabilities  of  75%,  increasing  to 
near  90%  when  combined  with  other  indices  (at  very  unstable 
threshold  values) .  A  hypothesis  for  this  is  their  High 
Plains  location.  Both  of  these  locations  reside  near  or 
above  1000  feet  in  elevation  (Table  1) ,  which  are  the 
highest  of  all  locations  used  in  this  study.  There  seems  to 
be  some  influence  as  to  the  significance  of  the  other 
indices  at  these  locations.  LI  is  calculated  using  the 
average  mixing  ratio  in  the  lowest  3,000  feet  of  the 
sounding.  Other  indices,  like  the  closely  related  SSI, 
strictly  use  850mb  readings.  There  is  some  indication  here 
that  the  850mb  measurements  are  inadequate  in  revealing  a 
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relationship  between  the  indices  at  these  elevations.  The 


lowest  3,000  feet  method  appears  more  plausible  for  the 
higher  elevations. 

Typically,  higher  initial  probabilities  (first  tree 
split)  resulted  for  the  NO  T-Storm  category.  This  relates 
to  past  relationships  between  weather  forecasters  and  the 
use  of  indices.  Experience  tells  a  forecaster  that  stable 
values  of  indices  indicate  a  low  probability  of  thunderstorm 
occurrence  with  a  high  degree  of  confidence.  However, 
unstable  values  of  indices  usually  signify  to  a  forecaster 
that  further  analysis  is  required.  What  is  revealed  here 
are  the  significant  threshold  values  to  which  a  forecaster 
might  be  able  to  eliminate  the  need  for  a  thunderstorm 
analysis.  Initial  classification  probabilities  for  NO  T- 
Storm  ranged  from  75-80%  versus  60-70%  for  T-Storm 
classifications . 
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6.12  Regression  Tree  Results 


Comparisons  of  each  model  are  important  for  the 
regression  tree  results  because  the  number  of  cases  involved 
is  significantly  lower  for  both  the  maximized  and  the  full 
model  since  only  cases  involving  CG  lightning  strike  events 
were  considered.  The  regression  tree  results  were  developed 
as  a  forecast  tool  to  help  indicate  the  likelihood  of  either 
an  active  or  non-active  CG  lightning  event.  It  appears  that 
at  some  locations  the  CAPE  and  SWEAT  indices  are  the  most 
significant  predictors  to  the  expected  "activity"  of  CG 
lightning  events.  The  regression  tree  results  tabulated  and 
displayed  in  Appendix  A  should  be  used  by  forecasters  after 
they  first  determined  via  the  classification  tree  results, 
that  there  exists  a  high  probability  for  CG  lightning 
strikes  within  50nm  for  the  next  12 -hour  forecast  period. 

Table  13  is  a  summary  of  the  maximized  model  regression 
tree  results  for  00Z  OUN  and  LBF  that  are  tabulated  in 
Appendix  A. 


Table  13.  Sample  regression  tree  results  at  00Z  from 
_ Appendix  A  for  maximized  dataset. _ 


mean 

N 

OUN  (437) 

N 

mean 

mean 

N 

LBF  (516) 

N 

mean 

439 

334 

LI  =  -4.3 

103 

1455 

409 

386 

SSI  =  -4.1 

130 

1191 

227 

129 

TTI  =  45.8 

205 

572 

279 

267 

SSI  =  -1.83 

119 

699 

99 


Next  to  OUN  in  Table  13,  in  parenthesis,  there  were  a 
total  of  437  cases  of  CG  lightning  strike  events  for  the 
regression  tree  to  work  with.  The  primary  threshold  value 
and  index  determined  was  LI=  -4.3.  It  is  this  threshold 
value  which  best  deciphers  between  a  more  active  or  less 
active  CG  lightning  strike  event.  At  the  split  ‘(LI  =  -4.3) 
the  mean  CG  strike  count  per  the  N=103  events  was  1455  when 
the  LI<=  -4.3.  It  would  be  confusing  to  say  that  the  values 
greater  than  the  threshold  index  are  on  the  right  in  this 
case.  The  higher  mean  CG  lightning  strike  counts  are  on  the 
right  corresponding  to  more  unstable  index  values.  In  other 
words,  more  unstable  LI  values  are  more  negative.  The  mean 
CG  lightning  strike  counts  at  OUN  were  1455  when  LI  values 
were  less  than  -4.3  and  the  mean  CG  lightning  strike  counts 
were  439  for  LI  values  greater  than  -4.3.  This  indicates 
that  the  mean  CG  lightning  counts  were  over  300%  more  active 
on  days  when  LI  is  less  than  -4.3  (439  versus  1455) . 

Another  index  and  threshold  value  exhibiting  potential  as  a 
useful  predictor  was  TTI=45.8  and  was  taken  from  the  next 
node  of  the  same  tree.  In  this  case  is  it  less  confusing  to 
say  TTI  values  greater  than  45.8  signify  a  more  significant 
CG  lightning  strike  event  since  higher  TTI  values  are  more 
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unstable.  In  this  case,  the  mean  CG  lightning  strike  counts 
were  over  200%  more  active  (227  versus  572) . 

It  should  be  noted  that  the  regression  tree  output  in 
this  study  indicated  significantly  high  deviance  values  for 
all  locations.  This  is  to  be  expected  with  such  a  large 
range  in  the  indices  values  for  active  and  inactive  CG 
lighting  events.  The  model  results  may  not  explain  a 
significant  amount  of  the  variability  in  the  model  but  based 
on  the  data  presented  in  this  study,  it  represents  the  most 
significant  results  obtainable  through  the  use  of  S-Plus 
decision  tree  methodology.  A  student -t  test  revealed  that 
the  means  found  were  statistically  different  from  what  would 
be  expected  if  no  relationship  existed.  The  results 
obtained  were  also  consistent  with  customary  trends  and 
threshold  values  of  the  stability  indices  used  and 
statistically  reveal  the  most  significant  features  for 
weather  forecasters  to  concentrate  on. 

More  unstable  values  of  each  threshold  index  are 
required  for  the  regression  tree  results  to  best  determine 
an  active  event.  Perhaps  these  values  may  also  correlate 
well  to  severe  storms  outbreaks.  This  is  something  that  is 
left  for  future  research. 
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VII.  Conclusions  and  Recommendations 


7.1  Conclusions 

This  study  reveals  the  feasibility  of  using 
atmospheric  stability  indices  to  forecast  the  occurrence  of 
CG  lightning  activity  for  the  "active"  lightning  months  of 
May  through  September  (Objective  1).  This  study's  approach 
was  empirical  in  nature  and  represents  the  likelihood  of  CG 
lighting  probabilities  based  on  past  occurrences.  The 
study  first  suggests  an  improved  range  of  threshold  values, 
on  an  annual  basis,  than  those  provided  in  the  past  for 
general  thunderstorm  occurrences.  These  should  be 
implemented  for  the  stability  indices  when  predicting  CG 
lightning  activity,  which  is  closely  related  to 
thunderstorm  occurrence  (Objective  2) .  The  Midwest  upper- 
air  stations  studied  were  divided  into  northern  and 
southern  regions  and  a  slight  modification  for  the  annual 
threshold  ranges  was  required  for  a  few  of  the  specific 
indices,  depending  upon  their  location  and  sounding 
observation  time.  The  utility  of  these  thresholds  was  most 
useful  in  the  northern  Midwest  where  the  most  constructive 
indices  were  the  LI,  SSI,  TTI ,  SWEAT,  and  CAPE.  CG 
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lightning  occurred  between  67-74%  of  the  time  when  these 
indices  were  within  the  determined  thresholds.  The  KI  and 
KO  indices  had  a  56-57%  accuracy,  the  utility  of  which  is 
questionable . 

Alternatively,  the  annual  threshold  ranges  determined 
for  the  southern  plains  region  of  the  study  barely  exceeded 
50%  accuracy  for  any  of  the  indices.  This  leaves  the 
threshold  ranges  found  to  barely  possess  any  predictive 
ability  at  all,  based  on  the  threshold  ranges  established. 
This  region  is  more  active  in  the  winter  months  and  as  a 
result  the  false-alarm  rate  for  this  region  is  much 
greater.  The  influence  of  CG  lightning  events  in  the 
winter  months  is  much  stronger  for  the  southern  Midwest. 

In  fact,  many  of  the  locations  in  the  northern  Midwest  had 
little  to  no  CG  lightning  events  during  the  winter.  Box 
and  whisker  plots  revealed  that  the  indices  were  much  more 
variable  in  the  winter  months  as  well. 

It  was  determined  that  due  to  the  seasonal  variations 
of  the  indices,  especially  for  the  southern  Midwest  region, 
the  active  months  (5-9)  should  be  examined  exclusively  for 
further  study.  It  should  be  noted  then  that  the  results 
for  the  rest  of  this  study  are  for  the  combined  active 
months  (5-9) . 
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Linear  and  non-linear  regression  techniques  were 
applied  next  to  examine  the  CG  lightning  data  and  stability 
indices  for  any  predictive  relationships  (Objective  3)  that 
would  improve  upon  the  threshold  ranges  determined  earlier. 
Stepwise  linear  regression  eliminated  all  but  a  few 
specific  indices  for  the  best  model  fit,  but  even  then  no 
significant  relationships  were  found. 

Since  traditional  statistical  methods  failed  to  find 
any  significant  relationships,  new  methods  of  predicting  CG 
lightning  activity  using  stability  indices  were  explored 
using  decision  trees  from  new  data  mining  techniques 
(Objective  4) .  Reliable  and  significant  results  were 
obtained  and  a  new  predictive  forecasting  tool  was 
developed  that  allows  weather  forecasters  to  predict  the 
occurrence  or  non-occurrence  of  CG  lighting  events  with  an 
average  probability  of  between  80-90%  (Objective  5) .  The 
most  relevant  indices  and  threshold  values  were  determined 
for  each  individual  location  and  sounding  times.  Decision 
trees  implemented  an  inclusive  or  hierarchal  classification' 
approach  while  at  the  same  time  maximizing  the  inclusive 
event  counts.  This  inclusive  approach  means  that  observing 
one  index  threshold,  under  the  condition  that  other  indices 
thresholds  must  occur  as  well,  allows  for  the  significant 
probabilities  found. 
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Interestingly,  the  most  significant  indices  and 
threshold  values  determined  for  each  location  by  the 
decision  tree  lead  to  a  predictable  sequence.  The  Lifted 
Index  was  determined  best  for  use  in  the  high  plains 
locations  (RAP  and  AMA)  for  both  sounding  times,  in  part 
due  to  their  higher  station  elevations.  Next,  the 
Showalter  Index  was  most  significant  for  the  northern 
plains  region  of  the  study  at  00Z.  Further  south  in  the 
more  moist  regions  of  the  Midwest,  the  K- Index  was  the  most 
significant  at  00Z,  most  likely  due  to  the  fact  that  the  K- 
Index  provides  an  extra  measurement  for  moisture  at  the 
700mb  height  level.  This  extra  measurement  at  700mb  was 
also  proven  to  be  significant  for  the  12Z  sounding  times  as 
well  since  the  K- Index  significance  was  predominant  for 
most  locations  at  12Z.  This  was  easily  explained  by  the 
fact  that  the  morning  temperature  inversions  commonly  found 
at  12Z  during  the  active  months  (5-9)  in  the  Midwest  could 
not  be  resolved  by  the  850mb  temperature/moisture 
measurements  determined  by  most  of  the  other  indices. 

The  classification  tree  results  developed  allow 
forecasters  to  determine  the  probability  of  a  CG  lightning 
event  and,  if  a  forecaster  determines  that  CG  lightning  is 
expected,  the  regression  tree  results  allow  weather 
forecasters  to  determine  the  potential  frequency  or 
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"amount"  of  the  CG  lightning  activity  that  is  to  be 
expected.  These  results  were  then  displayed  in  a  user- 
friendly  format  by  location  and  time  in  both  graphical  and 
tabular  forms  as  a  forecast  tool  for  users  (Appendix  A  is 
written  as  a  ready  to  use  forecast  tool  for  users  by 
displaying  these  results) .  Regression  tree  results 
displayed  the  most  significant  stability  index  and 
threshold  value  for  each  location  whose  value  above/below 
gave  a  300-500%  increase/decrease  in  mean  CG  lighting 
activity  based  on  each  threshold  found.  Again,  only  events 
where  CG  lightning  did  occur  were  analyzed  under  the 
regression  trees  since  the  classification  tree  results 
where  first  used  to  determine  if  an  event  was  expected 
(Appendix  A  is  written  as  a  ready  to  use  forecast  tool  for 
users  by  displaying  these  regression  tree  results  in  both 
graphical  and  tabular  forms) . 

The  ability  of  a  weather  forecaster  to  predict  the 
probability  of  the  occurrence  or  non-occurrence  of  CG 
lightning  for  all  locations  analyzed  generally  exceeded  the 
80-90%  levels  which  has  far  reaching  implications. 
Additionally,  using  stability  indices  to  determine  the 
expected  amount  of  CG  lightning  is  unique.  Therefore,  the 
results  of  this  study  should  prove  to  be  a  useful  forecast 
tool  in  the  operational  environment. 
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7.2  Recommendations  for  Future  Study 


Other  techniques  of  analyzing  the  datasets  used  during 
the  course  of  this  study  were  discovered  that  could 
ultimately  improve  upon  the  results,  but  time  constraints 
prohibited  their  implementation  in  this  study.  A 
suggested  approach  is  to  develop  forecast  stability  indices 
generated  operational  forecast  models  and  compare  them  to 
CG  lightning  activity  in  the  same  manner  employed  in  this 
study . 

Another  approach  is  to  implement  a  specialized 
predictor  to  the  indices.  One  type  of  specialized 
predictor  is  sometimes  referred  to  as  an  interactive 
predictor.  Interactive  predictors  are  especially  important 
when  forecasting  rare  events  such  as  severe  thunderstorms 
and  tornados.  One  example  of  an  interactive  predictor  used 
to  forecast  thunderstorms  is  the  KF  predictor  (Reap  and 
Foster,  1979) ,  which  is  the  KI  multiplied  by  the 
thunderstorm  relative  frequency.  This  predictor  forces  the 
climatology  (the  relative  frequencies)  to  be  more 
responsive  to  the  current  synoptic  situation.  In  other 
words,  it  applies  a  weighting  factor  empirically,  based  on 
the  past  history  of  CG  lightning  strike  probabilities. 
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Similarly,  over  6  years  (93-00)  of  CG  lightning  data 
are  utilized  in  this  study  and  could  be  implemented  to 
create  monthly  frequency  distributions  of  CG  lightning 
strikes  (within  50nm) .  These  monthly  frequency 
distributions  might  be  useful  as  an  additional  input  to 
regression  analyses  (Reap  and  Foster,  1979) .  Also,  since 
this  study  demonstrated  that  decision  tree  analysis 
revealed  more  promising  results  than  regression  analysis. 
Table  14  suggests  an  example  method  to  be  used  in  the  same 
manner  decision  trees  were  employed  in  this  study. 


Table  14.  Example  modification  of  indices  that  could  be 
offered  as  predictors  to  the  screening 


classif ication/regression  tree  analyses. 

KI  multiplied  by  CG  lightning  relative  frequency. 

SWEAT  index  multiplied  by  CG  lightning  relative 
{ frequency. 

TTI  multiplied  by  CG  lightning  relative  frequency. 

LI  multiplied  by  CG  lightning  relative  frequency. 

7.3  Future  Data  Mining  Applications 


There  are  many  suggestions  for  the  use  of  data  mining 
tools  in  weather  research  since  it  appears  data  mining 
techniques  are  in  their  infancy  in  this  field.  Of 
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relevance  to  this  study  though  are  ways  to  utilize  the 
stability  indices  as  predictors  for  the  occurrence  of  CG 
lightning  strikes  and  the  potential  number  of  CG  strikes 
that  may  be  received. 

A  careful  computerized/technical  review  of  the  most 
important  forecasting  parameters,  as  summarized  by  Miller 
(1972)  and  as  developed  by  the  Air  Weather  Service  and 
National  Severe  Storms  Forecast  Center  (Koceilski) ,  could 
easily  be  revalidated  with  the  use  of  data  mining  methods. 
Suggested  weak,  moderate  and  strong  thresholds  were 
suggested  in  the  study  but  new,  more  significant, 
thresholds  could  still  be  discovered.  Classification  trees 
might,  in  fact,  be  capable  of  determining  the  single  most 
important  threshold  value  and  predictor  to  focus  a  weather 
forecast  analysis  on,  assuming  the  database  used  is  large 
enough  for  an  empirical  approach.  Other  additional 
parameters  could  also  be  considered  as  well.  Consistent 
results  among  data  mining  tools  indicate  to  weather 
forecasters  what  weather  parameters  they  should  concentrate 
their  analyses  on.  Benefits  may  also  include  substantial 
analysis  timesavings  as  well  as  increased  forecast 
accuracy. 
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7.4  Other  Atmospheric  Stability  Indices  to  Consider 

It  would  be  ideal  to  assess  the  potential  of  all 
available  atmospheric  stability  indices,  but  algorithm 
development  and  time  constraints  were  prohibitive  for  this 
study.  Some  of  the  indices  not  included  in  this  study  but 
which  are  suggested  for  future  study  as  predictors  of  CG 
lightning  activity  are: 

•  the  Fawbush-Miller  Stability  Index  (FMI) 

•  the  Martin  Index  (MI) 

•  the  Modified  Lifted  Index  (MLI) 

•  the  Bulk  Richardson  Number  (R) 

•  the  Dynamic  Index,  and  the 

•  Wet -Bulb  Zero  (WBZ)  Height  Index 

For  the  purpose  of  this  study,  the  one  index  that  was 
not  available  but  which  would  have  been  a  significant 
consideration  for  future  research  is  the  wet-bulb  zero 
(WBZ)  height  because  of  its  recent  utility  in  lightning 
research.  Traditionally  WBZ  heights  are  used  to  forecast 
hail  since  certain  threshold  value  ranges  correlate  well 
with  large  hail  events  at  the  surface  (Miller  et  al . , 

1972) .  Miller  showed  that  a  large  majority  of  the  reported 
surface  hail  occurred  when  WBZ  heights  are  between  5,000- 
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12,000ft  above  ground  level  (AGL)  while  large  hail  is  most 
likely  when  WBZ  heights  are  between  7, 000-11, 000ft  AGL. 
Again,  restrictions  to  these  values  as  well  as  any  other 
atmospheric  stability  index  exist  by  location  and  forecast 
regime  and  should  be  determined  for  individual  locations. 


7.5  Development  of  a  Lightning  Index 

Finding  an  improved  range  of  values  for  hail 
occurrence  by  location  would  be  useful,  but  in  relation  to 
this  study,  another  application  to  consider  is  WBZ  heights 
and  its  recent  application  to  the  study  of  lightning 
occurrence.  Theory  on  the  origin  of  lightning  suggests 
that  the  process  of  collision  and  coalescence  of  frozen 
particles  in  thunderstorms  is  the  primary  mechanism  for  the 
charge  separation  that  produces  lightning  in  thunderstorms 
(Dye,  1990) .  The  development  of  a  new  "Lightning  Index"  is 
currently  ongoing.  Stuart  et  al .  (1998),  suggests  a 

"Lightning  Index"  would  likely  be  based  on  specific 
thresholds  of  meteorological  parameters,  such  as  stability 
indices,  and  offer  some  form  of  prediction  capability  for 
the  production  and  frequency  of  lightning  on  a  daily  basis. 
Additionally,  his  suggestion  of  CAPE  and  LI  to  indicate  the 
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potential  strength  of  updrafts  and  instability  potential 
when  combined  with  WBZ  heights  should  be  studied.  This 
would  provide  information  that  may  indicate  the  potential 
for  the  production  of  frozen  particles,  which  is  thought  to 
be  important  to  the  formation  of  lightning  based  on  the 
theory  of  Dye  (1990) . 

Stuart  et  al .  (1998),  suggests  the  use  of  CAPE  and  LI, 

but  decision  tree  results  from  this  study  suggest  the 
significance  of  SSI  in  the  northern  region  of  the  study,  KI 
in  the  southern  region,  and  LI  in  the  high  western  plains 
region  as  the  most  significant  predictors  to  the  occurrence 
of  CG  lightning  events.  Perhaps  the  development  of  a 
"Lightning  Index"  should  consider  the  significant  indices 
found  in  the  results  of  this  study  for  their  use  as 
predictors  instead,  since  geographic  location  is  considered 
as  well.  The  results  of  this  study  also  suggest  more 
unstable  threshold  values  of  the  indices  are  required  when 
applying  to  the  frequency  (or  amount)  of  CG  lighting 
expected. 
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1.6  Implementation  of  Results 


The  results  of  this  study  using  classification  and 
regression  trees  were  significant  enough  to  implement 
immediately  as  a  forecast  tool  for  the  operational  weather 
forecast  environment.  Appendix  A  of  this  study  is  written 
as  a  "ready-to-use"  forecast  tool  for  weather  forecasters. 
It  is  suggested  that  Air  Force  Weather  units  in  the  Midwest 
U.S.  use  this  "innovative"  forecast  tool  immediately  for 
forecasting  CG  lightning  activity. 


Ill 


Appendix  A:  Optimal  Decision  Tree  Maximized  Model  Results 


This  appendix  is  written  as  a  stand-alone  forecast  tool 
taken  from  the  thesis  research  results  of  Capt .  Ken  Venzke,  Air 
Force  Institute  of  Technology,  Wright-Patterson  AFB,  OH.  It 
summarizes  the  official  decision  tree  results  to  assist 
forecasters  in  determining  the  probability  of  lightning  activity 
or  non-activity  for  individual  upper-air  sounding  locations  in 
the  Midwest  U.S.  This  forecast  tool  is  valid  for  the  "active" 
months  of  May  to  September.  The  stability  indices  determined  as 
the  most  significant  by  this  study  were  the  Showalter  (SSI) ,  K- 
Index  (KI) ,  Total  Totals  (TTI ) ,  and  Lifted  Index  (LI) . 

First,  a  brief  description  is  made  on  how  to  interpolate 
the  results,  followed  by  the  official  results  in  graphical  and 
tabular  form  for  both  00Z  and  12Z  valid  sounding  times. 

To  begin,  an  example  tabular  summary  is  referenced  in  Table 
A- 1  along  with  the  same  summary  in  graphic  form  in  Figures  A-l 
and  A-2.  The  two  upper-air  sounding  locations  are  OAX  (Omaha, 
NE)  and  TOP  (Topeka,  KS) .  The  number  in  parenthesis  next  to  the 
locations  is  the  total  number  of  observations  surveyed.  It 
should  be  noted  that  only  the  active  months  May  to  September 
from  1993  to  2000  were  assessed  for  this  study.  There  are  two 
categories  derived  for  the  probability  (P)  of  the  number  (N)  of 
occurrence/non-occurrences  of  lightning  events  (T-Storm/No  T- 
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storm) .  The  results  are  also  inclusive,  which  is  the  reason  for 
the  "if",  "&",  and  a  few  "or  if"  statements  labeled  next  to  each 

stability  index  threshold.  This  inclusive  approach  means  that 
observing  one  index  threshold,  under  the  condition  that  other 
index  thresholds  must  occur  as  well,  allows  for  the  significant 
probabilities  found.  Each  combination  listed  leads  to  an 
increased  probability  of  categorical  occurrence,  but  are  valid 
only  if  each  occur  inclusively  of  the  initial  index  threshold 
value . 

The  example  for  OAX  in  Table  A-l  should  read  as  follows: 
There  were  a  total  of  800  observations  available  from  1993  to 
2000.  The  most  significant  stability  index  at  this  location  was 
when  SSIcl.l,  of  which,  66%  of  the  time  a  thunderstorm  occurred 
within  50nm  of  the  station  during  the  valid  12  hour  sounding 
time  period.  This  probability  increased  to  78%  when  both 
SSIcl . 3  and  KI>30.5  occurred.  Finally,  the  maximum  probability 
found  for  a  T-Storm  event  at  OAX  was  87%  with  the  additional 
requirement  that  TTI>50.1  must  occur  in  combination  with  the 
other  two  thresholds.  Alternatively,  the  maximum  probability 
(91%)  found  for  a  non-event  (No  T-Storm)  is  when  the 
combinations  SSI>1.3,  LI>2.5,  and  KIcll.l  occur  inclusively. 
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Table  A-l.  OOZ  Tabular  summary  classification  example. 


TOP  (930) 
T-Storm 

N 

P 

if 

SSK2.2 

550 

0.63 

& 

Kl>22.9 

441 

0.69 

& 

TTI>49 

120 

0.74 

& 

Kl>35.2 

130 

0.82 

No  T-Storm 

N 

P 

if 

SSI>2.2 

380 

0.76 

& 

KK10.6 

124 

0.9 

1 

OAX  (800) 
T-Storm 

N 

P 

if 

SSK1.3 

377 

0.66 

& 

Kl>30.5 

209 

0.78 

& 

TTI>50.1 

101 

0.87 

No  T-Storm 

N 

P 

if 

SSM.3 

423 

0.73 

& 

Ll>2.5 

296 

0.81 

& 

KK11.1 

106 

0.91 

Figure  A-l 


OOZ  Graphical  classification  results  example  for 
T-Storm  probability. 
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Figure  A- 2 . 


OOZ  Maximized  classification  tree  example  for 
No  T-Storm  probability. 


The  official  graphic  results  (Figures  A-3  and  A-4)  allow 
the  forecaster  to  "visualize"  the  results  geographically.  In 
summary,  the  highest  probabilities,  roughly  75-85%  for  a 
lightning  event,  were  obtained  when  a  TTI>50.0  was  combined  with 
other  indices  at  many  locations,  but  the  most  significant  index 
(which  is  always  listed  first)  and  threshold  value,  that  must 
occur  first,  varied  by  location.  Interestingly,  the  indices  and 
threshold  values  determined  for  each  location  lead  to  a 
predictable  sequence.  The  LI  was  determined  best  for  use  in  the 
high  plains  locations  (RAP  and  AMA)  for  both  sounding  times,  in 
part  due  to  their  higher  station  elevations.  Next,  the  SSI  was 
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most  significant  for  the  northern  Midwest  region  of  the  study  at 
00Z.  Further  south,  in  the  more  moist  regions  of  the  Midwest, 
the  KI  was  the  most  significant  at  00Z,  most  likely  due  to  the 
fact  that  the  KI  provides  an  extra  measurement  for  moisture  at 
the  700mb  height  level.  This  extra  measurement  at  700mb  was 
also  proven  to  be  significant  for  the  12Z  sounding  times  as  well 
since  the  KI  significance  was  predominant  for  most  locations  at 
12Z.  This  was  easily  explained  by  the  fact  that  the  morning 
temperature  inversions  commonly  found  at  12Z  during  the  active 
months  (May  to  September)  in  the  Midwest  could  not  be  resolved 
by  the  850mb  temperature/moisture  measurements  via  the  other 
indices . 

The  mean  strike  threshold  results  are  displayed  in 
graphical  form  in  Figures  A- 7  and  A- 8,  and  in  tabular  form  in 
Tables  A-4  and  A-5.  The  classification  results  developed  allow 
forecasters  to  determine  the  probability  of  a  lightning  event 
and,  if  a  forecaster  determines  that  lightning  is  expected,  the 
mean  strike  results  allow  weather  forecasters  to  determine  the 
potential  frequency  or  "amount"  of  lightning  activity  that  is  to 
be  expected. 

The  mean  strike  results  displayed  the  most  significant 
stability  indices  and  threshold  values  for  each  location  whose 
value  above/below  gave  a  300-500%  increase/decrease  in  the  mean 
lighting  activity  for  each  event.  Only  events  where  lightning 
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did  occur  were  analyzed  under  the  mean  strike  results  since  the 
classification  results  where  first  used  to  determine  if  an  event 
was  expected. 

Next  to  OUN  in  Table  A- 2,  in  parenthesis,  there  were  a 
total  of  437  cases  of  lightning  strike  events  for  the  mean 
strike  results  to  work  with. 


Table  A-2 .  Sample  mean  strike  results  at  00Z  for  OUN  and 

LBF . 


mean 

N 

OUN  (437) 

N 

mean 

mean 

N 

LBF  (516) 

N 

mean 

439 

334 

LI  =  -4.3 

103 

1455 

409 

386 

SSI  =  -4.1 

130 

1191 

227 

129 

TTI  =  45.8 

205 

572 

279 

267 

SSI  =  -1.83 

119 

699 

The  primary  threshold  value  and  index  determined  was  LI=  -4.3. 
This  threshold  value  best  deciphers  between  a  more  active  or 
less  active  lightning  strike  event.  At  the  split  (LI=  -4.3), 
the  mean  lightning  strike  count  for  the  N=103  events  was  1455 
when  the  LI<=  -4.3.  It  would  be  confusing  to  say  that  the 
values  greater  than  the  threshold  index  are  on  the  right  in  this 
case.  The  higher  mean  lightning  strike  counts  are  on  the  right 
corresponding  to  more  unstable  stability  index  values.  In  other 
words,  more  unstable  LI  values  are  more  negative  in  this  case. 
The  mean  lightning  strike  counts  at  OUN  were  1455  when  LI  values 
were  less  than  -4.3  and  the  mean  lightning  strike  counts  were 
439  for  LI  values  greater  than  -4.3.  This  indicates  that  the 
mean  lightning  strike  counts  were  over  300%  more  active  on  days 
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when  LI  is  less  than  —4.3  (439  versus  1455) .  Another  index  and 

threshold  value  exhibiting  potential  as  a  useful  predictor  was 
TTI  =  45.8  and  was  also  considered  significant  for  this 
location.  In  this  case  is  it  less  confusing  to  say  TTI  values 
greater  than  45.8  signify  a  more  active  lightning  strike  event 
since  higher  TTI  values  are  more  unstable.  In  this  case,  the 
mean  lightning  strike  counts  were  over  200%  more  active  (227 
versus  572) . 

The  ability  of  a  weather  forecaster  to  predict  the 
probability  of  the  occurrence  or  non-occurrence  of  lightning  for 
all  locations  analyzed  generally  exceeded  the  80-90%  probability 
levels,  which  has  far  reaching  implications.  Additionally, 
using  stability  indices  to  determine  the  expected  amount  of 
lightning  strike  counts  is  unique.  The  results  of  this  study 
should  prove  to  be  a  useful  forecast  tool  in  the  operational 
environment . 
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Figure  A-7. 


OOZ  mean  strike  thresholds  results . _ 
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Table  A-3.  OOZ  Lightning  Probability  Results. 


OUN  (1009) 

T-Storm 

N 

P 

if 

Kl>25.2 

605 

0.56 

& 

LK-1.1 

402 

0.63 

& 

Kl>35 

157 

0.74 

No  T-Storm 

N 

P 

if 

KK25.2 

404 

0.75 

& 

TTK46.7 

293 

0.8 

& 

KK10.8 

107 

0.87 

LBF  (1037) 

T-Storm 

N 

P 

if 

SSK1.1 

585 

0.69 

& 

Kl>30.6 

305 

0.78 

& 

TTI>52 

147 

0.86 

No  T-Storm 

N 

P 

if 

SSI>1.1 

452 

0.76 

& 

SSI>5.6 

152 

0.84 

TOP  (930) 
T-Storm 

1 

N 

P 

if 

SSK2.2 

550 

0.63 

& 

Kl>22.9 

441 

0.69 

& 

TTI>49 

120 

0.74 

& 

Kl>35.2 

130 

0.82 

No  T-Storm 

N 

P 

if 

SSI>2.2 

380 

0.76 

& 

KK10.6 

124 

0.9 

OAX  (800) 
T-Storm 

N 

P 

if 

SSK1.3 

377 

0.66 

& 

Kl>30.5 

209 

0.78 

& 

TTI>50.1 

101 

0.87 

No  T-Storm 

N 

P 

if 

SSM.3 

423 

0.73 

& 

Ll>2.5 

296 

0.81 

& 

KK11.1 

106 

0.91 

SGF  (723) 

T-Storm 

N 

P 

if 

Kl>30.7 

239 

0.73 

& 

Kl>34.2 

121 

0.84 

No  T-Storm 

N 

P 

if 

KK30.7 

484 

0.69 

& 

KK13.35 

158 

0.88 

RAP  (946) 
T-Storm 

N 

P 

if 

LK1.1 

459 

0.75 

& 

SSK-3.3 

128 

0.91 

No  T-Storm 

N 

P 

if 

LM.1 

487 

0.73 

& 

Ll>2.6 

339 

0.81 

& 

Ll>6.7 

111 

0.88 

125 


Table  A- 3 .  OOZ  Lightning  Probability  Results  (cont. 


FWD  (816) 

T-Storm 

N 

P 

if 

Kl>30.5 

346 

0.61 

& 

Kl>37.1 

110 

0.8 

No  T-Storm 

N 

P 

if 

KK30.5 

470 

0.75 

& 

TTK41.2 

132 

0.92 

DDC  (1012) 

T-Storm 

N 

P 

if 

SSK  -0.5 

546 

0.69 

& 

Kl>30.8 

366 

0.77 

& 

SSK  -2.4 

248 

0.82 

No  T-Storm 

N 

P 

if 

SSI>  -0.5 

466 

0.7 

& 

KK25.0 

261 

0.78 

& 

SSI>4.3 

132 

0.86 

DVN  (731) 
T-Storm 
Kl>25.5 
SSK  -0.3 


N  P 
291  0.7 

133  0.86 


LZK  (1006) 
T-Storm 
Kl>27.3 
TTI>45.7 
Kl>33.4 


N  P 
491  0.65 

255  0.8 

153  0.84 


No  T-Storm 
KK25.5 
KK17 
Ll>3.8 


440  0.78 

310  0.83 

203  0.89 


.9 


No  T-Storm 
KK27.3 
Ll>0.4 
KK11.9 


515  0.77 
310  0.86 
157  0.89 


SHV  (749 

LJ 

T-Storm 

N 

P 

TTI>44.1 

451 

0.6 

Kl>26.6 

405 

0.7 

Kl>35.9 

116 

0.84 

T-Storm 
if  Ll<  -0.4 

&  Kl>38.4 

or  if  KK38.4 
&  TTI>51.7 


N  P 
479  0.75 
139  0.87 
340  0.69 
154  0.77 


No  T-Storm  N  P 
TTK44.1  298  0.79 

Ll>1 .2  143  0.87 


No  T-Storm 
Ll>  -0.4 
Ll>3.1 
LK3.1 
Ll>1 .4 


N  P 
500  0.72 
171  0.87 
329  0.64 
109  0.71 


Tab] 

Le  A- 3 .  12Z 

Lightning 

Probability  Results. 

OUN  (1003) 

LBF  (1031) 

T-Storm 

N 

P 

T-Storm 

N 

P 

if 

Kl>28.4 

497 

0.62 

if 

LK4.0 

691 

0.49 

& 

TTI>46.2 

332 

0.69 

& 

Kl>27.5 

412 

0.58 

& 

Kl>35.4 

167 

0.79 

& 

Kl>32.9 

162 

0.66 

No  T-Storm 

N 

P 

No  T-Storm 

N 

P 

if 

KK28.4 

506 

0.77 

if 

Ll>4.0 

340 

0.85 

& 

KK15.5 

182 

0.89 

& 

KK12.0 

100 

0.95 

TOP  (930) 
T-Storm 

N 

P 

if 

Kl>23.3 

514 

0.61 

& 

Kl>32.9 

239 

0.77 

& 

TTI>47.9 

137 

0.84 

No  T-Storm 

N 

P 

if 

KK23.3 

411 

0.84 

& 

KK7.7 

136 

0.94 

OAX  (800) 
T-Storm 

N 

P 

if 

Kl>24.2 

380 

0.6 

& 

Kl>32.0 

188 

0.71 

No  T-Storm 

N 

P 

if 

KK24.2 

422 

0.81 

& 

SSI>3.7 

307 

0.87 

& 

Ll>10 

100 

0.93 

SGF  (714) 
T-Storm 

N 

P 

if 

Kl>23.8 

374 

0.66 

& 

Kl>33.2 

143 

0.87 

No  T-Storm 

N 

P 

if 

KK23.8 

374 

0.82 

& 

Ll>7.2 

120 

0.97 

RAP  (927) 
T-Storm 

N 

P 

if 

LK3.0 

484 

0.71 

& 

Kl>25.4 

292 

0.78 

& 

Ll<  -0.6 

119 

0.88 

No  T-Storm 

N 

P 

if 

Ll>3.0 

443 

0.76 

& 

Ll>8.3 

136 

0.93 
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Table  A-3.  12Z  Lightning  Probability  Results  (cont.) 

FWD  (814)  1  I  I  1  DDC  (994)~ 

T-Storm  N  P  T-Storm  N  P 


FWD  (814) 

T-Storm 

N 

P 

if 

Kl>27.8 

440 

0.58 

& 

Kl>34.8 

179 

0.73 

- 

No  T-Storm 

N 

P 

if 

KK27.8 

374 

0.82 

& 

SSI>2.95 

174 

0.93 

DVN  (728) 

T-Storm 

N 

P 

if 

Kl>25.2 

284 

0.65 

& 

LK1.1 

169 

0.76 

No  T -Storm 

N 

P 

if 

KK25.2 

444 

0.83 

& 

Ll>3.2 

327 

0.89 

& 

TTK38.2 

218 

0.94 

if  Kl>21.3 

&  Kl>32.7 

&  Kl>37.4 

No  T-Storm 
if  KK21.3 

&  TTK40.5 


N  P 

726  0.47 
306  0.61 
106  0.72 

N  P 

268  0.89 
118  0.93 


LZK  (1224) 

T-Storm 

N 

P 

if 

SSK2.6 

660 

0.66 

& 

LK1.4 

543 

0.72 

& 

Kl>29.6 

332 

0.81 

& 

SSK2.5 

100 

0.92 

No  T-Storm 

N 

P 

if 

SSI>2.6 

564 

0.81 

& 

KK10.9 

273 

0.94 

T-Storm 

Kl>26.4 

Kl>33.5 


N  P 
402  0.72 

179  0.87 


T-Storm 

LK1.9 

Kl>27.4 

Kl>35.1 


646  0.57 
460  0.65 
156  0.76 


No  T-Storm  N  P 

Kl>26.4  346  0.77 

Ll>  -1.4  237  0.87 

TTK37  101  0.93 


No  T-Storm 
Ll>1.9 
Ll>5.1 


N  P 
286  0.86 
162  0.94 


Table  A-3.  12Z  Lightning  Probability  Results  (cont.) 

1  FSI  (195)  T~ 

T-Storm  N  P 

if  Kl>30.6  86  0.62 

No  T-Storm  N  P 

if  Kl>30.6  109  0.77 


mean  N  FWD  (328)  J  N  mean  I  mean  N  DDC  (517)  [  N  mean 

282  228  TTI  =  50.4  100  1513  745  394  LI  = -4.3  123  2008 

407  148  SSI  =  -0.5  246  948 

798  142  TTI  =  50.1  104  1153 


mean  N  SHV  (334)  |  N  mean  |  mean  N  AMA  (498 
360  234  Ll=  -4.3  100  1182  1  566  354  Ll=  -2.9 

169  110  TTI=45.5  124  530  I  461  281  Kl=37.8 


Table  A- 5.  12 Z  Mean  Lightning  Strike  Results. 


mean 

N 

OUN  (423) 

N 

mean 

mean 

N 

LBF  (393) 

N 

mean 

484 

251 

SSI  =  -1.7 

172 

939 

171 

293 

Kl  =  33.3 

100 

432 

609 

112 

LI  =  0.3 

139 

384 

122 

112 

LI  =  0.3 

181 

249 

mean 

N 

OAX  (308) 

N 

mean 

mean 

N 

TOP  (378) 

N 

mean 

235 

189 

SSI  =  -0.68 

119 

659 

479 

260 

SSI  =  -2.13 

118 

1388 

294 

128 

Kl  =  30.1 

132 

659 

mean 

N 

RAP  (402) 

N 

mean 

mean 

N 

SGF  (305) 

N 

mean 

348 

235 

SSI=  -2.2 

167 

935 

399 

183 

Kl=33.3 

122 

750 

mean  N 

FWD  (326) 

N  mean 

mean 

N 

DDC  (373) 

N 

mean 

442  214 

LI  =  -2.8 

112  1107 

172 

196 

LI  =  -1.0 

177 

427 

244 

51 

Kl  =  33.5 

126 

501 

mean  N  DVN(261)  N  mean 

mean  N  LZK(541)  N  mean 

304  100  TTI  =  44.6  161  1330 

458  300  LI  =  -1.7  241  1235 

288  Kl  =  29.2  656 

674  Kl  =  31.9  1634 

mean  N 

SHV  () 

N  mean 

mean  N 

AMA  () 

N  mean 

N/A 

See  LZK 
Results 

N/A 

See  FWD 
Results 
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Appendix  B:  Decision  Tree  Full  Model  Cross-Validation  Results 


The  results  of  the  full  models  are  included  for 
comparison  reasons  in  appendix  B  to  provide  evidence  of  how 
well  the  decision  tree  results  of  this  study  cross- 
validate.  Cross-validation  means  comparing  a  smaller  study 
sample  to  the  maximized  database,  and  if  similar  results 
are  found  for  the  smaller  study  sample  compared  to  the 
maximized  sample,  then  the  results  cross-validate  well  and 
are  considered  significant.  Full  model  in  this  case  means 
that  all  of  the  indices  were  included  in  the  decision  tree 
model  run  which  equated  to  a  30-40%  smaller  database  due  to 
missing  data  in  a  few  of  the  stability  indices  that  have 
already  been  determined  as  less  significant  predictors  (KO, 
CAPE,  and  SWEAT) .  So  the  maximized  model  results  do  not 
include  the  less  significant  indices  and  therefore  is 
"maximized"  and  30-40%  larger. 

The  following  are  some  examples  of  very  significant, 
almost  identical,  cross-validations  from  the  textual 
summary  output.  Table  B-l  compares  the  maximized  model  of 
00Z  LBF  to  the  full  model.  Again  the  total  number  of  cases 
included  is  in  parenthesis  next  to  the  location  identifier. 
There  are  631/1037  =  0.61  or  39%  fewer  cases  in  the  full 
model  results  for  00Z  LBF,  yet  striking  similarities  exist 
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between  the  first  tree  node  threshold  (SSIcl.l  for  T-Storm 


or  SSI>1.1  for  No  T-Storm),  which  is  always  deemed  the  most 
significant  because  the  remaining  indices  all  depend 
(inclusively)  upon  the  condition  that  SSI<1.1  for  T-Storm 
or  SSI>1.1  for  No  T-Storm  and  therefore  must  first  exist  to 
be  valid.  Again,  when  the  number  of  cases  available  drop 
to  near  100  or  less,  accuracy  becomes  questionable.  So 
KO<3 . 3  for  the  full  model  No  T-Storm  category  only  contains 
51  cases  and  its  inclusion  in  the  maximized  model  results 
is  not  recommended.  Next,  compare  Tables  B-2  and  B-3,  and 
notice  the  similarities  between  the  maximized  tree  model 
results  versus  the  full  model.  The  reader  is  encouraged  to 
assess  the  cross-validations  of  the  other  locations  as  well 
as  the  12Z  sounding  results  of  the  full  model  in  this 
appendix  to  the  maximized  model  results  in  Appendix  A. 


Table  B-l.  Maximized  versus  Full  classification  tree 


results  for  00Z  LBF. 


LBF  (1037) 

VS. 

LBF  (631) 

T-Storm 

N 

P 

T-Storm 

N 

P 

if 

SSK1.1 

585 

0.69 

if 

SSK1.1 

355 

0.7 

& 

Kl>30.6 

305 

0.78 

& 

SSK  -3.4 

119 

0.86 

& 

TTI>52 

147 

0.86 

No  T-Storm 

N 

P 

No  T-Storm 

N 

P 

if 

SSM.1 

452 

0.76 

if 

SSM.1 

276 

0.75 

& 

SSI>5.6 

152 

0.84 

& 

SSI>5.2 

101 

0.85 

& 

KO<3.3 

51 

0.96 
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Table  B-2.  Maximized  versus  Full  classification  tree 

results  for  00Z  FWD . 


FWD  (816) 

VS. 

FWD  (471) 

T-Storm 

N 

P 

T-Storm 

N 

P 

if 

Kl>30.5 

346 

0.61 

if 

Kl>27.05 

288 

0.59 

& 

Kl>37.1 

110 

0.8 

& 

TTI>41.10 

223 

0.68 

& 

TTI>47 

108 

0.65 

No  T-Storm 

N 

P 

No  T-Storm 

N 

P 

if 

KK30.5 

470 

0.75 

if 

KK27.05 

183 

0.78 

& 

TTK41.2 

132 

0.92 

& 

TTK41.10 

61 

0.92 

& 

KK23.8 

160 

0.77 

Table  B-3 .  Maximized  versus  Full  classification  tree 

results  for  00Z  LZK . 


LZK  (1006) 

VS. 

LZK  (723) 

T-Storm 

N 

P 

T-Storm 

N 

P 

if 

Kl>27.3 

491 

0.65 

if 

Kl>27.3 

364 

0.67 

& 

TTI>45.7 

255 

0.8 

& 

TTI>45.7 

191 

0.81 

& 

Kl>33.4 

153 

0.84 

& 

TTI>49.6 

60 

0.95 

No  T-Storm 

N 

P 

No  T-Storm 

N 

P 

if 

KK27.3 

515 

0.77 

if 

KK27.3 

359 

0.77 

& 

Ll>0.4 

310 

0.86 

& 

Ll>0.4 

213 

0.85 

& 

KK11.9 

157 

0.89 

Again,  typically  when  the  available  cases  exist  above 
100  for  both  models,  they  cross-validate  very  well.  Notice 
that  the  00Z  LZK  results  in  Table  B-3  are  nearly  identical, 
yet  the  full  model  for  00Z  LZK  is  28%  smaller.  The  cross- 
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validation  results  should  be  very  compelling  to  skeptics 
and  should  be  considered  other  than  just  coincidence. 

Assessing  how  well  the  regression  tree  results  cross- 
validate  is  a  little  different  in  that  the  regression  tree 
model  only  includes  the  cases  for  CG  lightning  events  only, 
thus  sufficiently  reducing  the  number  of  cases  available 
for  regression  tree  model  fit.  Also,  the  regression  tree 
model  results  are  not  inclusive  and  not  categorical,  but 
instead  independent  and  numerical.  Similarly  though  is 
that  the  most  important  or  significant  index  and  threshold 
value  is  listed  first  because  of  the  higher  number  of  cases 
available.  Table  B-4  below  is  an  example  regression  tree 
cross-validation  for  00Z  OUN  with  the  significant 
similarities  highlighted.  There  are  approximately  25% 
fewer  cases  in  the  full  model.  Again,  the  results  between 
the  maximum  and  full  models  are  strikingly  similar. 

Table  B-4 .  Maximized  versus  Full  regression  tree 

results  for  00Z  OUN. 


mean 

N 

OUN  (325) 

N 

mean 

432 

268 

Ll=  -4.9 

57 

1722 

162 

102 

TTI=45.8 

166 

598 

424 

126 

o 

V* 

i 

II 

O 

* 

40 

1146 

mean 

N 

OUN  (437) 

N 

mean 

439 

334 

LI  =  -4.3 

103 

1455 

227 

129 

TTI  =  45.8 

205 

572 
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Table  B-6 .  OOZ  Full  Model  Classification  Tree  Results. 


DDC  (569) 

T-Storm 

N 

P 

if 

SSK  -1.0 

286 

0.75 

& 

Kl>30.8 

186 

0.84 

& 

Kl>35.0 

128 

0.88 

No  T-Storm 

N 

P 

if 

SSI>  -1.0 

283 

0.64 

& 

SWEAT<160 

145 

0.76 

& 

KO<  -0.1 

91 

0.82 

FWD  (471) 

T-Storm 

N 

P 

if 

Kl>27.05 

288 

0.59 

& 

TTI>41.10 

223 

0.68 

No  T-Storm 

N 

P 

if 

KK27.05 

183 

0.78 

& 

TTK41.10 

61 

0.92 

LZK  (723) 

T-Storm 

N 

P 

if 

Kl>27.3 

364 

0.67 

& 

TTI>45.7 

191 

0.81 

& 

TTI>49.6 

60 

0.95 

No  T-Storm 

N 

P 

if 

KK27.3 

359 

0.77 

& 

Ll>0.4 

213 

0.85 

1 

DVN  (437) 

T-Storm 

N 

P 

if 

SSK1.4 

159 

0.72 

& 

Kl>30.1 

98 

0.83 

No  T-Storm 

N 

P 

if 

SSM.4 

278 

0.75 

& 

KK25.5 

222 

0.8 

& 

CAPE>5.2 

146 

0.87 

SHV  () 

AMA  (478) 

T-Storm  N  P 
if  Ll<  -0.4  255  0.77 

&  SSK-2.6  102  0.86 


No  T-Storm  N  P 


Ll>  -0.4 

233 

0.71 

Ll>2.1 

N/A 

See  LZK 
results 


if 


Table  B-7 .  12Z  Full  Model  Classification  Tree  Results. 


OUN  (723) 

T-Storm  N  P 

if  Kl>26.8  419  0.61 

&  Ll<  -0.71  256  0.7 


No  T-Storm  N  P 
if  KK26.8  304  0.79 

&  CAPE<1812  253  0.88 


OAX  (500) 
T-Storm 

N 

P 

if 

Kl>24.2 

257 

0.63 

& 

Kl>33.0 

107 

0.78 

& 

Kl>36.5 

51 

0.84 

No  T-Storm 

N 

P 

if 

KK24.2 

243 

0.81 

& 

TTK44.5 

189 

0.86 

RAP  (477) 

T-Storm 

N 

P 

if 

LK2.9 

254 

0.72 

& 

Ll>26.9 

144 

0.82 

& 

No  T-Storm 

N 

P 

if 

Ll>2.9 

223 

0.78 

& 

KK17.2 

102 

0.9 

LBF  (521) 

T-Storm 

N 

P 

if 

LK4.0 

443 

0.51 

& 

Kl>25.4 

316 

0.58 

& 

Ll>  -10.5 

266 

0.61 

& 

CAPE>251.2 

162 

0.7 

No  T-Storm 

N 

P 

if 

Ll>4.0 

178 

0.85 

& 

KK23.2 

128 

0.89 

TOP  (570) 
T-Storm 

N 

P 

if 

SSK0.5 

277 

0.69 

& 

Kl>33.2 

110 

0.82 

No  T-Storm 

N 

P 

if 

SSI>0.5 

293 

0.72 

& 

KK25.2 

203 

0.81 

& 

Ll>4.3 

116 

0.89 

SGF  (714) 


N/A 

See  LZK 
results 
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Table  B-8.  12Z  Full  Model  Classification  Tree  Results. 


DDC  (676) 
T-Storm 

N 

P 

if 

Kl>32.8 

208 

0.64 

& 

Ll<  -0.1 

158 

0.69 

& 

SSK  -3.7 

56 

0.71 

No  T-Storm 

N 

P 

if 

KK23 

207 

0.85 

& 

TTK46.9 

156 

0.89 

FWD  (510) 

T-Storm 

N 

P 

if 

Kl>27.8 

291 

0.58 

& 

Kl>34.75 

134 

0.75 

No  T-Storm 

N 

P 

if 

KK27.8 

219 

0.81 

& 

KO>  -13.3 

168 

0.9 

& 

TTK42.55 

105 

0.96 

LZK  (719) 
T-Storm 

N 

P 

if 

SSK1.2 

371 

0.73 

& 

Kl>23 

318 

0.78 

& 

CAPE>679 

255 

0.82 

& 

Kl>30.0 

203 

0.85 

& 

SSK  -2.6 

69 

0.93 

No  T-Storm 

N 

P 

if 

SSI>1.2 

348 

0.76 

& 

Kl>9.6 

99 

0.95 

DVN  (350) 
T-Storm 

j 

N 

P 

if 

Kl>23.2 

180 

0.66 

& 

CAPE>330 

113 

0.77 

No  T-Storm 

N 

P 

if 

KK23.2 

170 

0.84 

& 

TTK38.2 

90 

0.94 

AM  A  (591) 

T-Storm 

N 

P 

if 

LK1.3 

360 

0.63 

& 

Kl>28.6 

255 

0.71 

& 

Kl>34.9 

104 

0.8 

No  T-Storm 

N 

P 

if 

Kl>1 .3 

231 

0.79 

& 

KK21.3 

104 

0.88 

SHV  (476) 

T-Storm 

N 

P 

if 

Kl>33.5 

134 

0.87 

& 

KO<  -11.0 

84 

0.93 

No  T-Storm 

N 

P 

if 

KK33.5 

134 
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RESULTS 
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Table  B-9.  OOZ  Full  Model  Regression  Tree  Results. 
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See  LZK 
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results 
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276  147 
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100  1138 

mean 

N 

DDC  (317) 

N 

mean 

656 

238 

SSI=  -4.0 

79 

1913 

492 

196 

Kl=37.7 

43 

1423 

377 

151 

SWEAT=260 

45 

878 

mean 

N 

FWD  (328) 

N 

mean 

316 

186 

SSI=  -1.8 

131 

833 

167 

61 

CAPE=1328 

99 

408 

mean 

N 

DVN  (185) 

N 

mean 

512 

140 

SSI=  -1.5 

45 

2017 

268 

82 

Ll=  -0.2 

58 

858 

mean 

N 

LZK  (327) 

N 

mean 

261 

313  SWEAT=252 

66 

1654 

228 

216 

CAPE=2377 

45 

723 

mean 

N 

AM  A  (261) 

N 

mean 

550 

208 

SWEAT=312 

53 

1636 

323 

125 

Kl=35.4 

83 

893 

mean  N 

SHV  (261) 

N 

mean 

415  149 

SWEAT=238 

112 

1244 
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Table 

B-1C 

).  12Z  Full  Model  Regression 

Tree  Results. 
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KN32.4  110 

1153 

mean 

N  [ 

DDC 

(263) 

N 

mean 

214 

199 

Ll= 

-2.3 

64 

509 

mean 

N  | 

FWD  (211) 

N 

mean 

493 

159 

Ll=  -3.7 

52 

1422 

383 

107 

CAPE=  1676 

52 

720 

mean 

N 

LZK  (354) 
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222 
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514 
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N  [ 

DVN  (146) 

N 

mean 

541 

102 

CAPE=1541 

44 

1655 

268 

40 

TTI=43.8 

62 

718 

mean  N  SHV  (164)  |  N  mean  mean  N  AMA  (274)  J  N  mean 

556  76  SSI=  -1.9  76  1984  204  126  SSI= -0.78  148  661 


Appendix  C:  Final  S-Plus®  Decision  Tree  Output 


The  graphics  in  the  following  pages  are  the  combined, 
optimum,  classification  and  regression  tree  outputs  for 
each  location  in  this  study.  The  S-Plus  graphical  tree  and 
textual  summary  output  are  combined  and  displayed.  The 
information  contained  in  each  of  these  decision  trees  is 
the  basis  for  which  the  results  of  the  maximized  model  in 
Appendix  A  were  developed. 
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